image


Intel® 64 and IA-32 Architectures Software Developer’s Manual

Volume 2A: Instruction Set Reference, A-L


NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of ten volumes: Basic Architecture, Order Number 253665; Instruction Set Reference A-L, Order Number 253666; Instruction Set Reference M-U, Order Number 253667; Instruction Set Reference V-Z, Order Number 326018; Instruction Set Reference, Order Number 334569; System Programming Guide, Part 1, Order Number 253668; System Programming Guide, Part 2, Order Number 253669; System Programming Guide, Part 3, Order Number 326019; System Programming Guide, Part 4, Order Number 332831; Model-Specific Registers, Order Number 335592. Refer to all ten volumes when evaluating your design needs.


Order Number: 253666-067US

May 2018


Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses.

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifica- tions. Current characterized errata are available on request.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1- 800-548-4725, or by visiting http://www.intel.com/design/literature.htm.

Intel, the Intel logo, Intel Atom, Intel Core, Intel SpeedStep, MMX, Pentium, VTune, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others. Copyright © 1997-2018, Intel Corporation. All Rights Reserved.

CONTENTS

PAGE


CHAPTER 1

ABOUT THIS MANUAL

    1. INTEL® 64 AND IA-32 PROCESSORS COVERED IN THIS MANUAL 1-1

    2. OVERVIEW OF VOLUME 2A, 2B, 2C AND 2D: INSTRUCTION SET REFERENCE 1-4

    3. NOTATIONAL CONVENTIONS 1-4

      1. Bit and Byte Order 1-4

      2. Reserved Bits and Software Compatibility 1-5

      3. Instruction Operands 1-5

      4. Hexadecimal and Binary Numbers 1-6

      5. Segmented Addressing 1-6

      6. Exceptions 1-6

      7. A New Syntax for CPUID, CR, and MSR Values 1-6

1.4 RELATED LITERATURE 1-7

CHAPTER 2 INSTRUCTION FORMAT

    1. INSTRUCTION FORMAT FOR PROTECTED MODE, REAL-ADDRESS MODE, AND VIRTUAL-8086 MODE 2-1

      1. Instruction Prefixes 2-1

      2. Opcodes 2-3

      3. ModR/M and SIB Bytes 2-3

      4. Displacement and Immediate Bytes 2-3

      5. Addressing-Mode Encoding of ModR/M and SIB Bytes 2-4

    1. IA-32E MODE 2-7

      1. REX Prefixes 2-8

        1. Encoding 2-8

        2. More on REX Prefix Fields 2-8

        3. Displacement 2-11

        4. Direct Memory-Offset MOVs 2-11

        5. Immediates 2-11

        6. RIP-Relative Addressing 2-12

        7. Default 64-Bit Operand Size 2-12

2.2.2 Additional Encodings for Control and Debug Registers 2-12

    1. INTEL® ADVANCED VECTOR EXTENSIONS (INTEL® AVX) 2-13

      1. Instruction Format 2-13

      2. VEX and the LOCK prefix 2-13

      3. VEX and the 66H, F2H, and F3H prefixes 2-13

      4. VEX and the REX prefix 2-13

      5. The VEX Prefix 2-14

        2.3.5.1 VEX Byte 0, bits[7:0] 2-15

        2.3.5.2 VEX Byte 1, bit [7] - ‘R’ 2-15

        1. 3-byte VEX byte 1, bit[6] - ‘X’ 2-16

        2. 3-byte VEX byte 1, bit[5] - ‘B’ 2-16

        3. 3-byte VEX byte 2, bit[7] - ‘W’ 2-16

        4. 2-byte VEX Byte 1, bits[6:3] and 3-byte VEX Byte 2, bits [6:3]- ‘vvvv’ the Source or Dest Register Specifier 2-16

      1. Instruction Operand Encoding and VEX.vvvv, ModR/M 2-17

        1. 3-byte VEX byte 1, bits[4:0] - “m-mmmm” 2-18

        2. 2-byte VEX byte 1, bit[2], and 3-byte VEX byte 2, bit [2]- “L” 2-18

        3. 2-byte VEX byte 1, bits[1:0], and 3-byte VEX byte 2, bits [1:0]- “pp” 2-18

      2. The Opcode Byte 2-19

      3. The MODRM, SIB, and Displacement Bytes 2-19

      4. The Third Source Operand (Immediate Byte) 2-19

      5. AVX Instructions and the Upper 128-bits of YMM registers 2-19

        1. Vector Length Transition and Programming Considerations 2-19

      1. AVX Instruction Length 2-20

      2. Vector SIB (VSIB) Memory Addressing 2-20

2.3.12.1 64-bit Mode VSIB Memory Addressing 2-21

    1. AVX AND SSE INSTRUCTION EXCEPTION SPECIFICATION 2-21

      1. Exceptions Type 1 (Aligned memory reference) 2-26

      2. Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned) 2-27

      3. Exceptions Type 3 (<16 Byte memory argument) 2-28

      4. Exceptions Type 4 (>=16 Byte mem arg no alignment, no floating-point exceptions) 2-29

      5. Exceptions Type 5 (<16 Byte mem arg and no FP exceptions) 2-30

      6. Exceptions Type 6 (VEX-Encoded Instructions Without Legacy SSE Analogues) 2-31

      7. Exceptions Type 7 (No FP exceptions, no memory arg) 2-32

      8. Exceptions Type 8 (AVX and no memory argument) 2-32

      9. Exception Type 11 (VEX-only, mem arg no AC, floating-point exceptions) 2-33

      10. Exception Type 12 (VEX-only, VSIB mem arg, no AC, no floating-point exceptions) 2-34

2.5 VEX ENCODING SUPPORT FOR GPR INSTRUCTIONS 2-34

2.5.1 Exception Conditions for VEX-Encoded GPR Instructions 2-35

    1. INTEL® AVX-512 ENCODING 2-35

      1. Instruction Format and EVEX 2-36

      2. Register Specifier Encoding and EVEX 2-38

      3. Opmask Register Encoding 2-38

      4. Masking Support in EVEX 2-39

      5. Compressed Displacement (disp8*N) Support in EVEX 2-39

      6. EVEX Encoding of Broadcast/Rounding/SAE Support 2-41

      7. Embedded Broadcast Support in EVEX 2-41

      8. Static Rounding Support in EVEX 2-41

      9. SAE Support in EVEX 2-41

      10. Vector Length Orthogonality 2-41

      11. #UD Equations for EVEX 2-42

        1. State Dependent #UD 2-42

        2. Opcode Independent #UD 2-42

        3. Opcode Dependent #UD 2-43

      1. Device Not Available 2-44

      2. Scalar Instructions 2-44

    1. EXCEPTION CLASSIFICATIONS OF EVEX-ENCODED INSTRUCTIONS 2-44

      1. Exceptions Type E1 and E1NF of EVEX-Encoded Instructions 2-48

      2. Exceptions Type E2 of EVEX-Encoded Instructions 2-50

      3. Exceptions Type E3 and E3NF of EVEX-Encoded Instructions 2-51

      4. Exceptions Type E4 and E4NF of EVEX-Encoded Instructions 2-53

      5. Exceptions Type E5 and E5NF 2-55

      6. Exceptions Type E6 and E6NF 2-57

      7. Exceptions Type E7NM 2-59

      8. Exceptions Type E9 and E9NF 2-60

      9. Exceptions Type E10 2-62

      10. Exception Type E11 (EVEX-only, mem arg no AC, floating-point exceptions) 2-64

      11. Exception Type E12 and E12NP (VSIB mem arg, no AC, no floating-point exceptions) 2-65

2.8 EXCEPTION CLASSIFICATIONS OF OPMASK INSTRUCTIONS 2-67

CHAPTER 3

INSTRUCTION SET REFERENCE, A-L

    1. INTERPRETING THE INSTRUCTION REFERENCE PAGES 3-1

      1. Instruction Format 3-1

        1. Opcode Column in the Instruction Summary Table (Instructions without VEX Prefix) 3-2

        2. Opcode Column in the Instruction Summary Table (Instructions with VEX prefix) 3-3

        3. Instruction Column in the Opcode Summary Table 3-5

        4. Operand Encoding Column in the Instruction Summary Table 3-8

        5. 64/32-bit Mode Column in the Instruction Summary Table 3-8

        6. CPUID Support Column in the Instruction Summary Table 3-10

        7. Description Column in the Instruction Summary Table 3-10

        8. Description Section 3-10

        9. Operation Section 3-10

        10. Intel® C/C++ Compiler Intrinsics Equivalents Section 3-13

        11. Flags Affected Section 3-15

        12. FPU Flags Affected Section 3-15

        13. Protected Mode Exceptions Section 3-15

        14. Real-Address Mode Exceptions Section 3-16

        15. Virtual-8086 Mode Exceptions Section 3-16

        16. Floating-Point Exceptions Section 3-17

        17. SIMD Floating-Point Exceptions Section 3-17

        18. Compatibility Mode Exceptions Section 3-17

        19. 64-Bit Mode Exceptions Section 3-17

3.2 INSTRUCTIONS (A-L) 3-18

AAA—ASCII Adjust After Addition 3-19

AAD—ASCII Adjust AX Before Division 3-21

AAM—ASCII Adjust AX After Multiply 3-23

AAS—ASCII Adjust AL After Subtraction 3-25

ADC—Add with Carry 3-27

ADCX — Unsigned Integer Addition of Two Operands with Carry Flag 3-30

ADD—Add 3-32

ADDPD—Add Packed Double-Precision Floating-Point Values 3-34

ADDPS—Add Packed Single-Precision Floating-Point Values 3-37

ADDSD—Add Scalar Double-Precision Floating-Point Values 3-40

ADDSS—Add Scalar Single-Precision Floating-Point Values 3-42

ADDSUBPD—Packed Double-FP Add/Subtract 3-44

ADDSUBPS—Packed Single-FP Add/Subtract 3-46

ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag 3-49

AESDEC—Perform One Round of an AES Decryption Flow 3-51

AESDECLAST—Perform Last Round of an AES Decryption Flow 3-53

AESENC—Perform One Round of an AES Encryption Flow 3-55

AESENCLAST—Perform Last Round of an AES Encryption Flow 3-57

AESIMC—Perform the AES InvMixColumn Transformation 3-59

AESKEYGENASSIST—AES Round Key Generation Assist 3-60

AND—Logical AND 3-62

ANDN — Logical AND NOT 3-64

ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values 3-65

ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values 3-68

ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values 3-71

ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values 3-74

ARPL—Adjust RPL Field of Segment Selector 3-77

BLENDPD — Blend Packed Double Precision Floating-Point Values 3-79

BEXTR — Bit Field Extract 3-81

BLENDPS — Blend Packed Single Precision Floating-Point Values 3-82

BLENDVPD — Variable Blend Packed Double Precision Floating-Point Values 3-84

BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values 3-86

BLSI — Extract Lowest Set Isolated Bit 3-89

BLSMSK — Get Mask Up to Lowest Set Bit 3-90

BLSR — Reset Lowest Set Bit 3-91

BNDCL—Check Lower Bound 3-92

BNDCU/BNDCN—Check Upper Bound 3-94

BNDLDX—Load Extended Bounds Using Address Translation 3-96

BNDMK—Make Bounds 3-99

BNDMOV—Move Bounds 3-101

BNDSTX—Store Extended Bounds Using Address Translation 3-104

BOUND—Check Array Index Against Bounds 3-107

BSF—Bit Scan Forward 3-109

BSR—Bit Scan Reverse 3-111

BSWAP—Byte Swap 3-113

BT—Bit Test 3-114

BTC—Bit Test and Complement 3-116

BTR—Bit Test and Reset 3-118

BTS—Bit Test and Set 3-120

BZHI — Zero High Bits Starting with Specified Bit Position 3-122

CALL—Call Procedure 3-123

CBW/CWDE/CDQE—Convert Byte to Word/Convert Word to Doubleword/Convert Doubleword to Quadword 3-136

CLAC—Clear AC Flag in EFLAGS Register 3-137

CLC—Clear Carry Flag 3-138

CLD—Clear Direction Flag 3-139

CLFLUSH—Flush Cache Line 3-140

CLFLUSHOPT—Flush Cache Line Optimized 3-142

CLI — Clear Interrupt Flag 3-144

CLTS—Clear Task-Switched Flag in CR0 3-146

CLWB—Cache Line Write Back 3-147

CMC—Complement Carry Flag 3-149

CMOVcc—Conditional Move 3-150

CMP—Compare Two Operands 3-154

CMPPD—Compare Packed Double-Precision Floating-Point Values 3-156

CMPPS—Compare Packed Single-Precision Floating-Point Values 3-163

CMPS/CMPSB/CMPSW/CMPSD/CMPSQ—Compare String Operands 3-170

CMPSD—Compare Scalar Double-Precision Floating-Point Value 3-174

CMPSS—Compare Scalar Single-Precision Floating-Point Value 3-178

CMPXCHG—Compare and Exchange 3-182

CMPXCHG8B/CMPXCHG16B—Compare and Exchange Bytes 3-184

COMISD—Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS 3-187

COMISS—Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS 3-189

CPUID—CPU Identification 3-191

CRC32 — Accumulate CRC32 Value 3-228

CVTDQ2PD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Values 3-231

CVTDQ2PS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Values 3-235

CVTPD2DQ—Convert Packed Double-Precision Floating-Point Values to Packed Doubleword Integers 3-238

CVTPD2PI—Convert Packed Double-Precision FP Values to Packed Dword Integers 3-242

CVTPD2PS—Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point

Values 3-243

CVTPI2PD—Convert Packed Dword Integers to Packed Double-Precision FP Values 3-247

CVTPI2PS—Convert Packed Dword Integers to Packed Single-Precision FP Values 3-248

CVTPS2DQ—Convert Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer

Values 3-249

CVTPS2PD—Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision Floating-Point

Values 3-252

CVTPS2PI—Convert Packed Single-Precision FP Values to Packed Dword Integers 3-255

CVTSD2SI—Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer 3-256

CVTSD2SS—Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value. .3-258 CVTSI2SD—Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value 3-260

CVTSI2SS—Convert Doubleword Integer to Scalar Single-Precision Floating-Point Value 3-262

CVTSS2SD—Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value. .3-264 CVTSS2SI—Convert Scalar Single-Precision Floating-Point Value to Doubleword Integer 3-266

CVTTPD2DQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Doubleword

Integers 3-268

CVTTPD2PI—Convert with Truncation Packed Double-Precision FP Values to Packed Dword Integers 3-272

CVTTPS2DQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values 3-273

CVTTPS2PI—Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers 3-276

CVTTSD2SI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer 3-277

CVTTSS2SI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Integer 3-279

CWD/CDQ/CQO—Convert Word to Doubleword/Convert Doubleword to Quadword 3-281

DAA—Decimal Adjust AL after Addition 3-282

DAS—Decimal Adjust AL after Subtraction 3-284

DEC—Decrement by 1 3-286

DIV—Unsigned Divide 3-288

DIVPD—Divide Packed Double-Precision Floating-Point Values 3-291

DIVPS—Divide Packed Single-Precision Floating-Point Values 3-294

DIVSD—Divide Scalar Double-Precision Floating-Point Value 3-297

DIVSS—Divide Scalar Single-Precision Floating-Point Values 3-299

DPPD — Dot Product of Packed Double Precision Floating-Point Values 3-301

DPPS — Dot Product of Packed Single Precision Floating-Point Values 3-303

EMMS—Empty MMX Technology State 3-306

ENTER—Make Stack Frame for Procedure Parameters 3-307

EXTRACTPS—Extract Packed Floating-Point Values 3-310

F2XM1—Compute 2x–1 3-312

FABS—Absolute Value 3-314

FADD/FADDP/FIADD—Add 3-315

FBLD—Load Binary Coded Decimal 3-318

FBSTP—Store BCD Integer and Pop 3-320

FCHS—Change Sign 3-322

FCLEX/FNCLEX—Clear Exceptions 3-324

FCMOVcc—Floating-Point Conditional Move 3-326

FCOM/FCOMP/FCOMPP—Compare Floating Point Values 3-328

FCOMI/FCOMIP/ FUCOMI/FUCOMIP—Compare Floating Point Values and Set EFLAGS 3-331

FCOS— Cosine 3-334

FDECSTP—Decrement Stack-Top Pointer 3-336

FDIV/FDIVP/FIDIV—Divide 3-337

FDIVR/FDIVRP/FIDIVR—Reverse Divide 3-340

FFREE—Free Floating-Point Register 3-343

FICOM/FICOMP—Compare Integer 3-344

FILD—Load Integer 3-346

FINCSTP—Increment Stack-Top Pointer 3-348

FINIT/FNINIT—Initialize Floating-Point Unit 3-349

FIST/FISTP—Store Integer 3-351

FISTTP—Store Integer with Truncation 3-354

FLD—Load Floating Point Value 3-356

FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZ—Load Constant 3-358

FLDCW—Load x87 FPU Control Word 3-360

FLDENV—Load x87 FPU Environment 3-362

FMUL/FMULP/FIMUL—Multiply 3-364

FNOP—No Operation 3-367

FPATAN—Partial Arctangent 3-368

FPREM—Partial Remainder 3-370

FPREM1—Partial Remainder 3-372

FPTAN—Partial Tangent 3-374

FRNDINT—Round to Integer 3-376

FRSTOR—Restore x87 FPU State 3-377

FSAVE/FNSAVE—Store x87 FPU State 3-379

FSCALE—Scale 3-382

FSIN—Sine 3-384

FSINCOS—Sine and Cosine 3-386

FSQRT—Square Root 3-388

FST/FSTP—Store Floating Point Value 3-390

FSTCW/FNSTCW—Store x87 FPU Control Word 3-392

FSTENV/FNSTENV—Store x87 FPU Environment 3-394

FSTSW/FNSTSW—Store x87 FPU Status Word 3-396

FSUB/FSUBP/FISUB—Subtract 3-398

FSUBR/FSUBRP/FISUBR—Reverse Subtract 3-401

FTST—TEST 3-404

FUCOM/FUCOMP/FUCOMPP—Unordered Compare Floating Point Values 3-406

FXAM—Examine Floating-Point 3-409

FXCH—Exchange Register Contents 3-411

FXRSTOR—Restore x87 FPU, MMX, XMM, and MXCSR State 3-413

FXSAVE—Save x87 FPU, MMX Technology, and SSE State 3-416

FXTRACT—Extract Exponent and Significand 3-424

FYL2X—Compute y * log2x 3-426

FYL2XP1—Compute y * log2(x +1) 3-428

HADDPD—Packed Double-FP Horizontal Add 3-430

HADDPS—Packed Single-FP Horizontal Add 3-433

HLT—Halt 3-436

HSUBPD—Packed Double-FP Horizontal Subtract 3-437

HSUBPS—Packed Single-FP Horizontal Subtract 3-440

IDIV—Signed Divide 3-443

IMUL—Signed Multiply 3-446

IN—Input from Port 3-450

INC—Increment by 1 3-452

INS/INSB/INSW/INSD—Input from Port to String 3-454

INSERTPS—Insert Scalar Single-Precision Floating-Point Value 3-457

INT n/INTO/INT3/INT1—Call to Interrupt Procedure 3-460

INVD—Invalidate Internal Caches 3-473

INVLPG—Invalidate TLB Entries 3-475

INVPCID—Invalidate Process-Context Identifier 3-477

IRET/IRETD—Interrupt Return 3-480

Jcc—Jump if Condition Is Met 3-487

JMP—Jump 3-492

KADDW/KADDB/KADDQ/KADDD—ADD Two Masks 3-500

KANDW/KANDB/KANDQ/KANDD—Bitwise Logical AND Masks 3-501

KANDNW/KANDNB/KANDNQ/KANDND—Bitwise Logical AND NOT Masks 3-502

KMOVW/KMOVB/KMOVQ/KMOVD—Move from and to Mask Registers 3-503

KNOTW/KNOTB/KNOTQ/KNOTD—NOT Mask Register 3-505

KORW/KORB/KORQ/KORD—Bitwise Logical OR Masks 3-506

KORTESTW/KORTESTB/KORTESTQ/KORTESTD—OR Masks And Set Flags 3-507

KSHIFTLW/KSHIFTLB/KSHIFTLQ/KSHIFTLD—Shift Left Mask Registers 3-509

KSHIFTRW/KSHIFTRB/KSHIFTRQ/KSHIFTRD—Shift Right Mask Registers 3-511

KTESTW/KTESTB/KTESTQ/KTESTD—Packed Bit Test Masks and Set Flags 3-513

KUNPCKBW/KUNPCKWD/KUNPCKDQ—Unpack for Mask Registers 3-515

KXNORW/KXNORB/KXNORQ/KXNORD—Bitwise Logical XNOR Masks 3-516

KXORW/KXORB/KXORQ/KXORD—Bitwise Logical XOR Masks 3-517

LAHF—Load Status Flags into AH Register 3-518

LAR—Load Access Rights Byte 3-519

LDDQU—Load Unaligned Integer 128 Bits 3-522

LDMXCSR—Load MXCSR Register 3-524

LDS/LES/LFS/LGS/LSS—Load Far Pointer 3-525

LEA—Load Effective Address 3-529

LEAVE—High Level Procedure Exit 3-531

LFENCE—Load Fence 3-533

LGDT/LIDT—Load Global/Interrupt Descriptor Table Register 3-534

LLDT—Load Local Descriptor Table Register 3-537

LMSW—Load Machine Status Word 3-539

LOCK—Assert LOCK# Signal Prefix 3-541

LODS/LODSB/LODSW/LODSD/LODSQ—Load String 3-543

LOOP/LOOPcc—Loop According to ECX Counter 3-546

LSL—Load Segment Limit 3-548

LTR—Load Task Register 3-551

LZCNT— Count the Number of Leading Zero Bits 3-553

CHAPTER 4

INSTRUCTION SET REFERENCE, M-U

    1. IMM8 CONTROL BYTE OPERATION FOR PCMPESTRI / PCMPESTRM / PCMPISTRI / PCMPISTRM 4-1

      1. General Description 4-1

      2. Source Data Format 4-2

      3. Aggregation Operation 4-2

      4. Polarity 4-3

        viii Vol. 2A

      5. Output Selection 4-4

      6. Valid/Invalid Override of Comparisons 4-4

      7. Summary of Im8 Control byte 4-5

      8. Diagram Comparison and Aggregation Process 4-6

    1. COMMON TRANSFORMATION AND PRIMITIVE FUNCTIONS FOR SHA1XXX AND SHA256XXX 4-6

    2. INSTRUCTIONS (M-U) 4-7

MASKMOVDQU—Store Selected Bytes of Double Quadword 4-8

MASKMOVQ—Store Selected Bytes of Quadword 4-10

MAXPD—Maximum of Packed Double-Precision Floating-Point Values 4-12

MAXPS—Maximum of Packed Single-Precision Floating-Point Values 4-15

MAXSD—Return Maximum Scalar Double-Precision Floating-Point Value 4-18

MAXSS—Return Maximum Scalar Single-Precision Floating-Point Value 4-20

MFENCE—Memory Fence 4-22

MINPD—Minimum of Packed Double-Precision Floating-Point Values 4-23

MINPS—Minimum of Packed Single-Precision Floating-Point Values 4-26

MINSD—Return Minimum Scalar Double-Precision Floating-Point Value 4-29

MINSS—Return Minimum Scalar Single-Precision Floating-Point Value 4-31

MONITOR—Set Up Monitor Address 4-33

MOV—Move 4-35

MOV—Move to/from Control Registers 4-40

MOV—Move to/from Debug Registers 4-43

MOVAPD—Move Aligned Packed Double-Precision Floating-Point Values 4-45

MOVAPS—Move Aligned Packed Single-Precision Floating-Point Values 4-49

MOVBE—Move Data After Swapping Bytes 4-53

MOVD/MOVQ—Move Doubleword/Move Quadword 4-55

MOVDDUP—Replicate Double FP Values 4-59

MOVDQA,VMOVDQA32/64—Move Aligned Packed Integer Values 4-62

MOVDQU,VMOVDQU8/16/32/64—Move Unaligned Packed Integer Values 4-67

MOVDQ2Q—Move Quadword from XMM to MMX Technology Register 4-75

MOVHLPS—Move Packed Single-Precision Floating-Point Values High to Low 4-76

MOVHPD—Move High Packed Double-Precision Floating-Point Value 4-78

MOVHPS—Move High Packed Single-Precision Floating-Point Values 4-80

MOVLHPS—Move Packed Single-Precision Floating-Point Values Low to High 4-82

MOVLPD—Move Low Packed Double-Precision Floating-Point Value 4-84

MOVLPS—Move Low Packed Single-Precision Floating-Point Values 4-86

MOVMSKPD—Extract Packed Double-Precision Floating-Point Sign Mask 4-88

MOVMSKPS—Extract Packed Single-Precision Floating-Point Sign Mask 4-90

MOVNTDQA—Load Double Quadword Non-Temporal Aligned Hint 4-92

MOVNTDQ—Store Packed Integers Using Non-Temporal Hint 4-94

MOVNTI—Store Doubleword Using Non-Temporal Hint 4-96

MOVNTPD—Store Packed Double-Precision Floating-Point Values Using Non-Temporal Hint 4-98

MOVNTPS—Store Packed Single-Precision Floating-Point Values Using Non-Temporal Hint 4-100

MOVNTQ—Store of Quadword Using Non-Temporal Hint 4-102

MOVQ—Move Quadword 4-103

MOVQ2DQ—Move Quadword from MMX Technology to XMM Register 4-106

MOVS/MOVSB/MOVSW/MOVSD/MOVSQ—Move Data from String to String 4-107

MOVSD—Move or Merge Scalar Double-Precision Floating-Point Value 4-111

MOVSHDUP—Replicate Single FP Values 4-114

MOVSLDUP—Replicate Single FP Values 4-117

MOVSS—Move or Merge Scalar Single-Precision Floating-Point Value 4-120

MOVSX/MOVSXD—Move with Sign-Extension 4-124

MOVUPD—Move Unaligned Packed Double-Precision Floating-Point Values 4-126

MOVUPS—Move Unaligned Packed Single-Precision Floating-Point Values 4-130

MOVZX—Move with Zero-Extend 4-134

MPSADBW — Compute Multiple Packed Sums of Absolute Difference 4-136

MUL—Unsigned Multiply 4-144

MULPD—Multiply Packed Double-Precision Floating-Point Values 4-146

MULPS—Multiply Packed Single-Precision Floating-Point Values 4-149

MULSD—Multiply Scalar Double-Precision Floating-Point Value 4-152

MULSS—Multiply Scalar Single-Precision Floating-Point Values 4-154

MULX — Unsigned Multiply Without Affecting Flags. 4-156

MWAIT—Monitor Wait 4-158

NEG—Two's Complement Negation 4-161

NOP—No Operation 4-163

NOT—One's Complement Negation 4-164

OR—Logical Inclusive OR 4-166

ORPD—Bitwise Logical OR of Packed Double Precision Floating-Point Values 4-168

ORPS—Bitwise Logical OR of Packed Single Precision Floating-Point Values 4-171

OUT—Output to Port 4-174

OUTS/OUTSB/OUTSW/OUTSD—Output String to Port 4-176

PABSB/PABSW/PABSD/PABSQ — Packed Absolute Value 4-180

PACKSSWB/PACKSSDW—Pack with Signed Saturation 4-186

PACKUSDW—Pack with Unsigned Saturation 4-194

PACKUSWB—Pack with Unsigned Saturation 4-199

PADDB/PADDW/PADDD/PADDQ—Add Packed Integers 4-204

PADDSB/PADDSW—Add Packed Signed Integers with Signed Saturation 4-211

PADDUSB/PADDUSW—Add Packed Unsigned Integers with Unsigned Saturation 4-215

PALIGNR — Packed Align Right 4-219

PAND—Logical AND 4-223

PANDN—Logical AND NOT. 4-226

PAUSE—Spin Loop Hint 4-229

PAVGB/PAVGW—Average Packed Integers 4-230

PBLENDVB — Variable Blend Packed Bytes 4-234

PBLENDW — Blend Packed Words 4-238

PCLMULQDQ — Carry-Less Multiplication Quadword 4-241

PCMPEQB/PCMPEQW/PCMPEQD— Compare Packed Data for Equal 4-244

PCMPEQQ — Compare Packed Qword Data for Equal 4-250

PCMPESTRI — Packed Compare Explicit Length Strings, Return Index 4-253

PCMPESTRM — Packed Compare Explicit Length Strings, Return Mask 4-255

PCMPGTB/PCMPGTW/PCMPGTD—Compare Packed Signed Integers for Greater Than 4-257

PCMPGTQ — Compare Packed Data for Greater Than 4-263

PCMPISTRI — Packed Compare Implicit Length Strings, Return Index 4-266

PCMPISTRM — Packed Compare Implicit Length Strings, Return Mask 4-268

PDEP — Parallel Bits Deposit 4-270

PEXT — Parallel Bits Extract 4-272

PEXTRB/PEXTRD/PEXTRQ — Extract Byte/Dword/Qword 4-274

PEXTRW—Extract Word 4-277

PHADDW/PHADDD — Packed Horizontal Add 4-280

PHADDSW — Packed Horizontal Add and Saturate 4-284

PHMINPOSUW — Packed Horizontal Word Minimum 4-286

PHSUBW/PHSUBD — Packed Horizontal Subtract 4-288

PHSUBSW — Packed Horizontal Subtract and Saturate 4-291

PINSRB/PINSRD/PINSRQ — Insert Byte/Dword/Qword 4-293

PINSRW—Insert Word 4-296

PMADDUBSW — Multiply and Add Packed Signed and Unsigned Bytes 4-298

PMADDWD—Multiply and Add Packed Integers 4-301

PMAXSB/PMAXSW/PMAXSD/PMAXSQ—Maximum of Packed Signed Integers 4-304

PMAXUB/PMAXUW—Maximum of Packed Unsigned Integers 4-311

PMAXUD/PMAXUQ—Maximum of Packed Unsigned Integers 4-316

PMINSB/PMINSW—Minimum of Packed Signed Integers 4-320

PMINSD/PMINSQ—Minimum of Packed Signed Integers 4-325

PMINUB/PMINUW—Minimum of Packed Unsigned Integers 4-329

PMINUD/PMINUQ—Minimum of Packed Unsigned Integers 4-334

PMOVMSKB—Move Byte Mask 4-338

PMOVSX—Packed Move with Sign Extend 4-340

PMOVZX—Packed Move with Zero Extend 4-349

PMULDQ—Multiply Packed Doubleword Integers 4-358

PMULHRSW — Packed Multiply High with Round and Scale 4-361

x Vol. 2A

PMULHUW—Multiply Packed Unsigned Integers and Store High Result 4-365

PMULHW—Multiply Packed Signed Integers and Store High Result 4-369

PMULLD/PMULLQ—Multiply Packed Integers and Store Low Result 4-373

PMULLW—Multiply Packed Signed Integers and Store Low Result 4-377

PMULUDQ—Multiply Packed Unsigned Doubleword Integers 4-381

POP—Pop a Value from the Stack 4-384

POPA/POPAD—Pop All General-Purpose Registers 4-389

POPCNT — Return the Count of Number of Bits Set to 1 4-391

POPF/POPFD/POPFQ—Pop Stack into EFLAGS Register 4-393

POR—Bitwise Logical OR 4-397

PREFETCHh—Prefetch Data Into Caches 4-400

PREFETCHW—Prefetch Data into Caches in Anticipation of a Write 4-402

PSADBW—Compute Sum of Absolute Differences 4-404

PSHUFB — Packed Shuffle Bytes 4-408

PSHUFD—Shuffle Packed Doublewords 4-412

PSHUFHW—Shuffle Packed High Words 4-416

PSHUFLW—Shuffle Packed Low Words 4-419

PSHUFW—Shuffle Packed Words 4-422

PSIGNB/PSIGNW/PSIGND — Packed SIGN 4-423

PSLLDQ—Shift Double Quadword Left Logical 4-427

PSLLW/PSLLD/PSLLQ—Shift Packed Data Left Logical 4-429

PSRAW/PSRAD/PSRAQ—Shift Packed Data Right Arithmetic 4-441

PSRLDQ—Shift Double Quadword Right Logical 4-451

PSRLW/PSRLD/PSRLQ—Shift Packed Data Right Logical 4-453

PSUBB/PSUBW/PSUBD—Subtract Packed Integers 4-465

PSUBQ—Subtract Packed Quadword Integers 4-472

PSUBSB/PSUBSW—Subtract Packed Signed Integers with Signed Saturation 4-475

PSUBUSB/PSUBUSW—Subtract Packed Unsigned Integers with Unsigned Saturation 4-479

PTEST- Logical Compare 4-483

PTWRITE - Write Data to a Processor Trace Packet 4-485

PUNPCKHBW/PUNPCKHWD/PUNPCKHDQ/PUNPCKHQDQ— Unpack High Data 4-487

PUNPCKLBW/PUNPCKLWD/PUNPCKLDQ/PUNPCKLQDQ—Unpack Low Data 4-497

PUSH—Push Word, Doubleword or Quadword Onto the Stack 4-507

PUSHA/PUSHAD—Push All General-Purpose Registers 4-510

PUSHF/PUSHFD/PUSHFQ—Push EFLAGS Register onto the Stack 4-512

PXOR—Logical Exclusive OR 4-514

RCL/RCR/ROL/ROR—Rotate 4-517

RCPPS—Compute Reciprocals of Packed Single-Precision Floating-Point Values 4-522

RCPSS—Compute Reciprocal of Scalar Single-Precision Floating-Point Values 4-524

RDFSBASE/RDGSBASE—Read FS/GS Segment Base 4-526

RDMSR—Read from Model Specific Register 4-528

RDPID—Read Processor ID 4-530

RDPKRU—Read Protection Key Rights for User Pages 4-531

RDPMC—Read Performance-Monitoring Counters 4-533

RDRAND—Read Random Number 4-537

RDSEED—Read Random SEED 4-539

RDTSC—Read Time-Stamp Counter 4-541

RDTSCP—Read Time-Stamp Counter and Processor ID 4-543

REP/REPE/REPZ/REPNE/REPNZ—Repeat String Operation Prefix 4-545

RET—Return from Procedure 4-549

RORX — Rotate Right Logical Without Affecting Flags 4-559

ROUNDPD — Round Packed Double Precision Floating-Point Values 4-560

ROUNDPS — Round Packed Single Precision Floating-Point Values 4-563

ROUNDSD — Round Scalar Double Precision Floating-Point Values 4-566

ROUNDSS — Round Scalar Single Precision Floating-Point Values 4-568

RSM—Resume from System Management Mode 4-570

RSQRTPS—Compute Reciprocals of Square Roots of Packed Single-Precision Floating-Point Values 4-572

RSQRTSS—Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value 4-574

SAHF—Store AH into Flags 4-576

SAL/SAR/SHL/SHR—Shift 4-578

SARX/SHLX/SHRX — Shift Without Affecting Flags 4-583

SBB—Integer Subtraction with Borrow 4-585

SCAS/SCASB/SCASW/SCASD—Scan String 4-588

SETcc—Set Byte on Condition 4-592

SFENCE—Store Fence 4-595

SGDT—Store Global Descriptor Table Register 4-596

SHA1RNDS4—Perform Four Rounds of SHA1 Operation 4-598

SHA1NEXTE—Calculate SHA1 State Variable E after Four Rounds 4-600

SHA1MSG1—Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords 4-601

SHA1MSG2—Perform a Final Calculation for the Next Four SHA1 Message Dwords 4-602

SHA256RNDS2—Perform Two Rounds of SHA256 Operation 4-603

SHA256MSG1—Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords 4-605

SHA256MSG2—Perform a Final Calculation for the Next Four SHA256 Message Dwords 4-606

SHLD—Double Precision Shift Left 4-607

SHRD—Double Precision Shift Right 4-610

SHUFPD—Packed Interleave Shuffle of Pairs of Double-Precision Floating-Point Values 4-613

SHUFPS—Packed Interleave Shuffle of Quadruplets of Single-Precision Floating-Point Values 4-618

SIDT—Store Interrupt Descriptor Table Register 4-622

SLDT—Store Local Descriptor Table Register 4-624

SMSW—Store Machine Status Word 4-626

SQRTPD—Square Root of Double-Precision Floating-Point Values 4-628

SQRTPS—Square Root of Single-Precision Floating-Point Values 4-631

SQRTSD—Compute Square Root of Scalar Double-Precision Floating-Point Value 4-634

SQRTSS—Compute Square Root of Scalar Single-Precision Value 4-636

STAC—Set AC Flag in EFLAGS Register 4-638

STC—Set Carry Flag 4-639

STD—Set Direction Flag 4-640

STI—Set Interrupt Flag 4-641

STMXCSR—Store MXCSR Register State 4-643

STOS/STOSB/STOSW/STOSD/STOSQ—Store String 4-644

STR—Store Task Register 4-648

SUB—Subtract 4-650

SUBPD—Subtract Packed Double-Precision Floating-Point Values 4-652

SUBPS—Subtract Packed Single-Precision Floating-Point Values 4-655

SUBSD—Subtract Scalar Double-Precision Floating-Point Value 4-658

SUBSS—Subtract Scalar Single-Precision Floating-Point Value 4-660

SWAPGS—Swap GS Base Register 4-662

SYSCALL—Fast System Call 4-664

SYSENTER—Fast System Call 4-666

SYSEXIT—Fast Return from Fast System Call 4-669

SYSRET—Return From Fast System Call 4-672

TEST—Logical Compare 4-675

TZCNT — Count the Number of Trailing Zero Bits 4-677

UCOMISD—Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS 4-679

UCOMISS—Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS 4-681

UD—Undefined Instruction 4-683

UNPCKHPD—Unpack and Interleave High Packed Double-Precision Floating-Point Values 4-684

UNPCKHPS—Unpack and Interleave High Packed Single-Precision Floating-Point Values 4-688

UNPCKLPD—Unpack and Interleave Low Packed Double-Precision Floating-Point Values 4-692

UNPCKLPS—Unpack and Interleave Low Packed Single-Precision Floating-Point Values 4-696

CHAPTER 5

INSTRUCTION SET REFERENCE, V-Z

    1. TERNARY BIT VECTOR LOGIC TABLE 5-1

    2. INSTRUCTIONS (V-Z) 5-4

VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors 5-5

VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control 5-9

VBROADCAST—Load with Broadcast Floating-Point Data 5-12

VCOMPRESSPD—Store Sparse Packed Double-Precision Floating-Point Values into Dense Memory 5-20

VCOMPRESSPS—Store Sparse Packed Single-Precision Floating-Point Values into Dense Memory 5-22

VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword Integers 5-24

VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers . 5-27 VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers . . . 5-30 VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values 5-33

VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value 5-36

VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer

Values 5-40

VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values . . 5-43 VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer

Values 5-46

VCVTQQ2PD—Convert Packed Quadword Integers to Packed Double-Precision Floating-Point Values 5-49

VCVTQQ2PS—Convert Packed Quadword Integers to Packed Single-Precision Floating-Point Values 5-51

VCVTSD2USI—Convert Scalar Double-Precision Floating-Point Value to Unsigned Doubleword Integer 5-53

VCVTSS2USI—Convert Scalar Single-Precision Floating-Point Value to Unsigned Doubleword Integer 5-54

VCVTTPD2QQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Quadword

Integers 5-56

VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers 5-58

VCVTTPD2UQQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers 5-61

VCVTTPS2UDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values 5-63

VCVTTPS2QQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values. 5-65

VCVTTPS2UQQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Unsigned

Quadword Integer Values 5-67

VCVTTSD2USI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Unsigned Integer 5-69

VCVTTSS2USI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Unsigned Integer 5-70

VCVTUDQ2PD—Convert Packed Unsigned Doubleword Integers to Packed Double-Precision Floating-Point Values . 5-72 VCVTUDQ2PS—Convert Packed Unsigned Doubleword Integers to Packed Single-Precision Floating-Point Values . . 5-74 VCVTUQQ2PD—Convert Packed Unsigned Quadword Integers to Packed Double-Precision Floating-Point Values . . . 5-76 VCVTUQQ2PS—Convert Packed Unsigned Quadword Integers to Packed Single-Precision Floating-Point Values 5-78

VCVTUSI2SD—Convert Unsigned Integer to Scalar Double-Precision Floating-Point Value 5-80

VCVTUSI2SS—Convert Unsigned Integer to Scalar Single-Precision Floating-Point Value 5-82

VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes 5-84

VEXPANDPD—Load Sparse Packed Double-Precision Floating-Point Values from Dense Memory 5-88

VEXPANDPS—Load Sparse Packed Single-Precision Floating-Point Values from Dense Memory 5-90

VERR/VERW—Verify a Segment for Reading or Writing 5-92

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed

Floating-Point Values 5-94

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer

Values 5-100

VFIXUPIMMPD—Fix Up Special Packed Float64 Values 5-106

VFIXUPIMMPS—Fix Up Special Packed Float32 Values 5-110

VFIXUPIMMSD—Fix Up Special Scalar Float64 Value 5-114

VFIXUPIMMSS—Fix Up Special Scalar Float32 Value 5-117

VFMADD132PD/VFMADD213PD/VFMADD231PD—Fused Multiply-Add of Packed Double-Precision Floating-Point

Values 5-120

VFMADD132PS/VFMADD213PS/VFMADD231PS—Fused Multiply-Add of Packed Single-Precision Floating-Point

Values 5-127

VFMADD132SD/VFMADD213SD/VFMADD231SD—Fused Multiply-Add of Scalar Double-Precision Floating-Point

Values 5-134

VFMADD132SS/VFMADD213SS/VFMADD231SS—Fused Multiply-Add of Scalar Single-Precision Floating-Point

Values 5-137

VFMADDSUB132PD/VFMADDSUB213PD/VFMADDSUB231PD—Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values 5-140

VFMADDSUB132PS/VFMADDSUB213PS/VFMADDSUB231PS—Fused Multiply-Alternating Add/Subtract of Packed

Single-Precision Floating-Point Values 5-150

VFMSUBADD132PD/VFMSUBADD213PD/VFMSUBADD231PD—Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values 5-159

VFMSUBADD132PS/VFMSUBADD213PS/VFMSUBADD231PS—Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values 5-169

VFMSUB132PD/VFMSUB213PD/VFMSUB231PD—Fused Multiply-Subtract of Packed Double-Precision

Floating-Point Values 5-179

VFMSUB132PS/VFMSUB213PS/VFMSUB231PS—Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values 5-186

VFMSUB132SD/VFMSUB213SD/VFMSUB231SD—Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values 5-193

VFMSUB132SS/VFMSUB213SS/VFMSUB231SS—Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values 5-196

VFNMADD132PD/VFNMADD213PD/VFNMADD231PD—Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values 5-199

VFNMADD132PS/VFNMADD213PS/VFNMADD231PS—Fused Negative Multiply-Add of Packed Single-Precision

Floating-Point Values 5-206

VFNMADD132SD/VFNMADD213SD/VFNMADD231SD—Fused Negative Multiply-Add of Scalar Double-Precision

Floating-Point Values 5-212

VFNMADD132SS/VFNMADD213SS/VFNMADD231SS—Fused Negative Multiply-Add of Scalar Single-Precision

Floating-Point Values 5-215

VFNMSUB132PD/VFNMSUB213PD/VFNMSUB231PD—Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values 5-218

VFNMSUB132PS/VFNMSUB213PS/VFNMSUB231PS—Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values 5-224

VFNMSUB132SD/VFNMSUB213SD/VFNMSUB231SD—Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values 5-230

VFNMSUB132SS/VFNMSUB213SS/VFNMSUB231SS—Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values 5-233

VFPCLASSPD—Tests Types Of a Packed Float64 Values 5-236

VFPCLASSPS—Tests Types Of a Packed Float32 Values 5-239

VFPCLASSSD—Tests Types Of a Scalar Float64 Values 5-241

VFPCLASSSS—Tests Types Of a Scalar Float32 Values 5-243

VGATHERDPD/VGATHERQPD — Gather Packed DP FP Values Using Signed Dword/Qword Indices 5-245

VGATHERDPS/VGATHERQPS — Gather Packed SP FP values Using Signed Dword/Qword Indices. 5-250

VGATHERDPS/VGATHERDPD—Gather Packed Single, Packed Double with Signed Dword 5-255

VGATHERQPS/VGATHERQPD—Gather Packed Single, Packed Double with Signed Qword Indices 5-258

VGETEXPPD—Convert Exponents of Packed DP FP Values to DP FP Values 5-261

VGETEXPPS—Convert Exponents of Packed SP FP Values to SP FP Values 5-264

VGETEXPSD—Convert Exponents of Scalar DP FP Values to DP FP Value 5-268

VGETEXPSS—Convert Exponents of Scalar SP FP Values to SP FP Value 5-270

VGETMANTPD—Extract Float64 Vector of Normalized Mantissas from Float64 Vector 5-272

VGETMANTPS—Extract Float32 Vector of Normalized Mantissas from Float32 Vector 5-276

VGETMANTSD—Extract Float64 of Normalized Mantissas from Float64 Scalar 5-279

VGETMANTSS—Extract Float32 Vector of Normalized Mantissa from Float32 Vector 5-281

VINSERTF128/VINSERTF32x4/VINSERTF64x2/VINSERTF32x8/VINSERTF64x4—Insert Packed Floating-Point

Values 5-283

VINSERTI128/VINSERTI32x4/VINSERTI64x2/VINSERTI32x8/VINSERTI64x4—Insert Packed Integer Values 5-287

VMASKMOV—Conditional SIMD Packed Loads and Stores 5-291

VPBLENDD — Blend Packed Dwords 5-294

VPBLENDMB/VPBLENDMW—Blend Byte/Word Vectors Using an Opmask Control 5-296

VPBLENDMD/VPBLENDMQ—Blend Int32/Int64 Vectors Using an OpMask Control 5-298

VPBROADCASTB/W/D/Q—Load with Broadcast Integer Data from General Purpose Register 5-301

VPBROADCAST—Load Integer and Broadcast 5-304

VPBROADCASTM—Broadcast Mask to Vector Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-313

VPCMPB/VPCMPUB—Compare Packed Byte Values Into Mask 5-315

VPCMPD/VPCMPUD—Compare Packed Integer Values into Mask 5-318

VPCMPQ/VPCMPUQ—Compare Packed Integer Values into Mask 5-321

VPCMPW/VPCMPUW—Compare Packed Word Values Into Mask 5-324

VPCOMPRESSD—Store Sparse Packed Doubleword Integer Values into Dense Memory/Register 5-327

VPCOMPRESSQ—Store Sparse Packed Quadword Integer Values into Dense Memory/Register 5-329

VPCONFLICTD/Q—Detect Conflicts Within a Vector of Packed Dword/Qword Values into Dense Memory/ Register. 5-331 VPERM2F128 — Permute Floating-Point Values 5-334

VPERM2I128 — Permute Integer Values 5-336

VPERMB—Permute Packed Bytes Elements 5-338

VPERMD/VPERMW—Permute Packed Doublewords/Words Elements 5-340

VPERMI2B—Full Permute of Bytes from Two Tables Overwriting the Index 5-343

VPERMI2W/D/Q/PS/PD—Full Permute From Two Tables Overwriting the Index 5-345

VPERMILPD—Permute In-Lane of Pairs of Double-Precision Floating-Point Values 5-351

VPERMILPS—Permute In-Lane of Quadruples of Single-Precision Floating-Point Values 5-356

VPERMPD—Permute Double-Precision Floating-Point Elements 5-361

VPERMPS—Permute Single-Precision Floating-Point Elements 5-364

VPERMQ—Qwords Element Permutation 5-367

VPERMT2B—Full Permute of Bytes from Two Tables Overwriting a Table 5-370

VPERMT2W/D/Q/PS/PD—Full Permute from Two Tables Overwriting one Table 5-372

VPEXPANDD—Load Sparse Packed Doubleword Integer Values from Dense Memory / Register 5-377

VPEXPANDQ—Load Sparse Packed Quadword Integer Values from Dense Memory / Register 5-379

VPGATHERDD/VPGATHERQD — Gather Packed Dword Values Using Signed Dword/Qword Indices 5-381

VPGATHERDD/VPGATHERDQ—Gather Packed Dword, Packed Qword with Signed Dword Indices 5-385

VPGATHERDQ/VPGATHERQQ — Gather Packed Qword Values Using Signed Dword/Qword Indices 5-388

VPGATHERQD/VPGATHERQQ—Gather Packed Dword, Packed Qword with Signed Qword Indices 5-392

VPLZCNTD/Q—Count the Number of Leading Zero Bits for Packed Dword, Packed Qword Values . . . . . . . . . . . . . . 5-395 VPMADD52HUQ—Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit

Accumulators 5-398

VPMADD52LUQ—Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword

Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-400 VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores 5-402

VPMOVB2M/VPMOVW2M/VPMOVD2M/VPMOVQ2M—Convert a Vector Register to a Mask 5-405

VPMOVDB/VPMOVSDB/VPMOVUSDB—Down Convert DWord to Byte 5-408

VPMOVDW/VPMOVSDW/VPMOVUSDW—Down Convert DWord to Word 5-412

VPMOVM2B/VPMOVM2W/VPMOVM2D/VPMOVM2Q—Convert a Mask Register to a Vector Register 5-416

VPMOVQB/VPMOVSQB/VPMOVUSQB—Down Convert QWord to Byte 5-419

VPMOVQD/VPMOVSQD/VPMOVUSQD—Down Convert QWord to DWord 5-423

VPMOVQW/VPMOVSQW/VPMOVUSQW—Down Convert QWord to Word 5-427

VPMOVWB/VPMOVSWB/VPMOVUSWB—Down Convert Word to Byte 5-431

VPMULTISHIFTQB – Select Packed Unaligned Bytes from Quadword Sources 5-435

VPROLD/VPROLVD/VPROLQ/VPROLVQ—Bit Rotate Left 5-437

VPRORD/VPRORVD/VPRORQ/VPRORVQ—Bit Rotate Right 5-442

VPSCATTERDD/VPSCATTERDQ/VPSCATTERQD/VPSCATTERQQ—Scatter Packed Dword, Packed Qword with

Signed Dword, Signed Qword Indices 5-447

VPSLLVW/VPSLLVD/VPSLLVQ—Variable Bit Shift Left Logical 5-451

VPSRAVW/VPSRAVD/VPSRAVQ—Variable Bit Shift Right Arithmetic 5-456

VPSRLVW/VPSRLVD/VPSRLVQ—Variable Bit Shift Right Logical 5-461

VPTERNLOGD/VPTERNLOGQ—Bitwise Ternary Logic 5-466

VPTESTMB/VPTESTMW/VPTESTMD/VPTESTMQ—Logical AND and Set Mask 5-469

VPTESTNMB/W/D/Q—Logical NAND and Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-472 VRANGEPD—Range Restriction Calculation For Packed Pairs of Float64 Values 5-476

VRANGEPS—Range Restriction Calculation For Packed Pairs of Float32 Values 5-481

VRANGESD—Range Restriction Calculation From a pair of Scalar Float64 Values 5-485

VRANGESS—Range Restriction Calculation From a Pair of Scalar Float32 Values 5-488

VRCP14PD—Compute Approximate Reciprocals of Packed Float64 Values 5-491

VRCP14SD—Compute Approximate Reciprocal of Scalar Float64 Value 5-493

VRCP14PS—Compute Approximate Reciprocals of Packed Float32 Values 5-495

VRCP14SS—Compute Approximate Reciprocal of Scalar Float32 Value 5-497

VREDUCEPD—Perform Reduction Transformation on Packed Float64 Values 5-499

VREDUCESD—Perform a Reduction Transformation on a Scalar Float64 Value 5-502

VREDUCEPS—Perform Reduction Transformation on Packed Float32 Values 5-504

VREDUCESS—Perform a Reduction Transformation on a Scalar Float32 Value 5-506

VRNDSCALEPD—Round Packed Float64 Values To Include A Given Number Of Fraction Bits 5-508

VRNDSCALESD—Round Scalar Float64 Value To Include A Given Number Of Fraction Bits 5-512

VRNDSCALEPS—Round Packed Float32 Values To Include A Given Number Of Fraction Bits 5-514

VRNDSCALESS—Round Scalar Float32 Value To Include A Given Number Of Fraction Bits 5-517

VRSQRT14PD—Compute Approximate Reciprocals of Square Roots of Packed Float64 Values 5-519

VRSQRT14SD—Compute Approximate Reciprocal of Square Root of Scalar Float64 Value 5-521

VRSQRT14PS—Compute Approximate Reciprocals of Square Roots of Packed Float32 Values 5-523

VRSQRT14SS—Compute Approximate Reciprocal of Square Root of Scalar Float32 Value 5-525

VSCALEFPD—Scale Packed Float64 Values With Float64 Values 5-527

VSCALEFSD—Scale Scalar Float64 Values With Float64 Values 5-530

VSCALEFPS—Scale Packed Float32 Values With Float32 Values 5-532

VSCALEFSS—Scale Scalar Float32 Value With Float32 Value 5-535

VSCATTERDPS/VSCATTERDPD/VSCATTERQPS/VSCATTERQPD—Scatter Packed Single, Packed Double with

Signed Dword and Qword Indices 5-537

VSHUFF32x4/VSHUFF64x2/VSHUFI32x4/VSHUFI64x2—Shuffle Packed Values at 128-bit Granularity 5-541

VTESTPD/VTESTPS—Packed Bit Test 5-546

VZEROALL—Zero All YMM Registers 5-549

VZEROUPPER—Zero Upper Bits of YMM Registers 5-550

WAIT/FWAIT—Wait 5-551

WBINVD—Write Back and Invalidate Cache 5-552

WRFSBASE/WRGSBASE—Write FS/GS Segment Base 5-554

WRMSR—Write to Model Specific Register 5-556

WRPKRU—Write Data to User Page Key Register 5-558

XACQUIRE/XRELEASE — Hardware Lock Elision Prefix Hints 5-559

XABORT — Transactional Abort 5-563

XADD—Exchange and Add 5-565

XBEGIN — Transactional Begin 5-567

XCHG—Exchange Register/Memory with Register 5-570

XEND — Transactional End 5-572

XGETBV—Get Value of Extended Control Register 5-574

XLAT/XLATB—Table Look-up Translation 5-576

XOR—Logical Exclusive OR 5-578

XORPD—Bitwise Logical XOR of Packed Double Precision Floating-Point Values 5-580

XORPS—Bitwise Logical XOR of Packed Single Precision Floating-Point Values 5-583

XRSTOR—Restore Processor Extended States 5-586

XRSTORS—Restore Processor Extended States Supervisor 5-591

XSAVE—Save Processor Extended States 5-595

XSAVEC—Save Processor Extended States with Compaction 5-598

XSAVEOPT—Save Processor Extended States Optimized 5-601

XSAVES—Save Processor Extended States Supervisor 5-604

XSETBV—Set Extended Control Register 5-608

XTEST — Test If In Transactional Execution 5-610

CHAPTER 6

SAFER MODE EXTENSIONS REFERENCE

    1. OVERVIEW 6-1

    2. SMX FUNCTIONALITY 6-1

      1. Detecting and Enabling SMX 6-1

      2. SMX Instruction Summary 6-2

        1. GETSEC[CAPABILITIES] 6-2

        2. GETSEC[ENTERACCS] 6-3

        3. GETSEC[EXITAC] 6-3

        4. GETSEC[SENTER] 6-3

        5. GETSEC[SEXIT] 6-4

        6. GETSEC[PARAMETERS] 6-4

        7. GETSEC[SMCTRL] 6-4

        8. GETSEC[WAKEUP] 6-4

6.2.3 Measured Environment and SMX 6-4

6.3 GETSEC LEAF FUNCTIONS 6-5

GETSEC[CAPABILITIES] - Report the SMX Capabilities 6-7

GETSEC[ENTERACCS] - Execute Authenticated Chipset Code 6-10

GETSEC[EXITAC]—Exit Authenticated Code Execution Mode 6-18

GETSEC[SENTER]—Enter a Measured Environment 6-21

GETSEC[SEXIT]—Exit Measured Environment 6-30

GETSEC[PARAMETERS]—Report the SMX Parameters 6-33

GETSEC[SMCTRL]—SMX Mode Control 6-37

GETSEC[WAKEUP]—Wake up sleeping processors in measured environment 6-40


CHAPTER 7

INSTRUCTION SET REFERENCE UNIQUE TO INTEL® XEON PHI™ PROCESSORS

PREFETCHWT1—Prefetch Vector Data Into Caches with Intent to Write and T1 Hint 6-2

V4FMADDPS/V4FNMADDPS — Packed Single-Precision Floating-Point Fused Multiply-Add (4-iterations) 6-4

V4FMADDSS/V4FNMADDSS —Scalar Single-Precision Floating-Point Fused Multiply-Add (4-iterations) 6-6

VEXP2PD—Approximation to the Exponential 2^x of Packed Double-Precision Floating-Point Values with Less Than

2^-23 Relative Error 6-8

VEXP2PS—Approximation to the Exponential 2^x of Packed Single-Precision Floating-Point Values with Less Than

2^-23 Relative Error 6-10

VGATHERPF0DPS/VGATHERPF0QPS/VGATHERPF0DPD/VGATHERPF0QPD—Sparse Prefetch Packed SP/DP Data

Values with Signed Dword, Signed Qword Indices Using T0 Hint 6-12

VGATHERPF1DPS/VGATHERPF1QPS/VGATHERPF1DPD/VGATHERPF1QPD—Sparse Prefetch Packed SP/DP Data

Values with Signed Dword, Signed Qword Indices Using T1 Hint 6-14

VP4DPWSSDS — Dot Product of Signed Words with Dword Accumulation and Saturation (4-iterations) 6-16

VP4DPWSSD — Dot Product of Signed Words with Dword Accumulation (4-iterations) 6-18

VRCP28PD—Approximation to the Reciprocal of Packed Double-Precision Floating-Point Values with Less Than

2^-28 Relative Error 6-20

VRCP28SD—Approximation to the Reciprocal of Scalar Double-Precision Floating-Point Value with Less Than

2^-28 Relative Error 6-22

VRCP28PS—Approximation to the Reciprocal of Packed Single-Precision Floating-Point Values with Less Than

2^-28 Relative Error 6-24

VRCP28SS—Approximation to the Reciprocal of Scalar Single-Precision Floating-Point Value with Less Than

2^-28 Relative Error 6-26

VRSQRT28PD—Approximation to the Reciprocal Square Root of Packed Double-Precision Floating-Point Values

with Less Than 2^-28 Relative Error 6-28

VRSQRT28SD—Approximation to the Reciprocal Square Root of Scalar Double-Precision Floating-Point Value

with Less Than 2^-28 Relative Error 6-30

VRSQRT28PS—Approximation to the Reciprocal Square Root of Packed Single-Precision Floating-Point Values

with Less Than 2^-28 Relative Error 6-32

VRSQRT28SS—Approximation to the Reciprocal Square Root of Scalar Single-Precision Floating-Point Value with

Less Than 2^-28 Relative Error 6-34

VSCATTERPF0DPS/VSCATTERPF0QPS/VSCATTERPF0DPD/VSCATTERPF0QPD—Sparse Prefetch Packed SP/DP

Data Values with Signed Dword, Signed Qword Indices Using T0 Hint with Intent to Write 6-36

VSCATTERPF1DPS/VSCATTERPF1QPS/VSCATTERPF1DPD/VSCATTERPF1QPD—Sparse Prefetch Packed SP/DP

Data Values with Signed Dword, Signed Qword Indices Using T1 Hint with Intent to Write 6-38


APPENDIX A OPCODE MAP

A.1 USING OPCODE TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

A.2 KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

A.2.1 Codes for Addressing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A-1

A.2.2 Codes for Operand Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A-2

A.2.3 Register Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A-3

A.2.4 Opcode Look-up Examples for One, Two, and Three-Byte Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A-3 A.2.4.1 One-Byte Opcode Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3

A.2.4.2 Two-Byte Opcode Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4

A.2.4.3 Three-Byte Opcode Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5

A.2.4.4 VEX Prefix Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5

A.2.5 Superscripts Utilized in Opcode Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A-6

A.3 ONE, TWO, AND THREE-BYTE OPCODE MAPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6

A.4 OPCODE EXTENSIONS FOR ONE-BYTE AND TWO-BYTE OPCODES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-17

A.4.1 Opcode Look-up Examples Using Opcode Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-17

A.4.2 Opcode Extension Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-17

A.5 ESCAPE OPCODE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-20

A.5.1 Opcode Look-up Examples for Escape Instruction Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-20

A.5.2 Escape Opcode Instruction Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-20

A.5.2.1 Escape Opcodes with D8 as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-20

A.5.2.2 Escape Opcodes with D9 as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-21

A.5.2.3 Escape Opcodes with DA as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-22

A.5.2.4 Escape Opcodes with DB as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-23

A.5.2.5 Escape Opcodes with DC as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-24

A.5.2.6 Escape Opcodes with DD as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-25

A.5.2.7 Escape Opcodes with DE as First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-26

A.5.2.8 Escape Opcodes with DF As First Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-27


APPENDIX B

INSTRUCTION FORMATS AND ENCODINGS

B.1 MACHINE INSTRUCTION FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

B.1.1 Legacy Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-1

B.1.2 REX Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-2

B.1.3 Opcode Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-2

B.1.4 Special Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-2

B.1.4.1 Reg Field (reg) for Non-64-Bit Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-3

B.1.4.2 Reg Field (reg) for 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-4

B.1.4.3 Encoding of Operand Size (w) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-4

B.1.4.4 Sign-Extend (s) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-5

B.1.4.5 Segment Register (sreg) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-5

B.1.4.6 Special-Purpose Register (eee) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-5

B.1.4.7 Condition Test (tttn) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-6

B.1.4.8 Direction (d) Bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-6

B.1.5 Other Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-6

B.2 GENERAL-PURPOSE INSTRUCTION FORMATS AND ENCODINGS FOR NON-64-BIT MODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7

B.2.1 General Purpose Instruction Formats and Encodings for 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18

    1. PENTIUM® PROCESSOR FAMILY INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-38

    2. 64-BIT MODE INSTRUCTION ENCODINGS FOR SIMD INSTRUCTION EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-38 B.5 MMX INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-39 B.5.1 Granularity Field (gg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-39

B.5.2 MMX Technology and General-Purpose Register Fields (mmxreg and reg) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-39 B.5.3 MMX Instruction Formats and Encodings Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-39

B.6 PROCESSOR EXTENDED STATE INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-42 B.7 P6 FAMILY INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-42 B.8 SSE INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-43 B.9 SSE2 INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-49 B.9.1 Granularity Field (gg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-49 B.10 SSE3 FORMATS AND ENCODINGS TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-60 B.11 SSSE3 FORMATS AND ENCODING TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-61

B.12 AESNI AND PCLMULQDQ INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-64 B.13 SPECIAL ENCODINGS FOR 64-BIT MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-65 B.14 SSE4.1 FORMATS AND ENCODING TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-67 B.15 SSE4.2 FORMATS AND ENCODING TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-72 B.16 AVX FORMATS AND ENCODING TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-74 B.17 FLOATING-POINT INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-114 B.18 VMX INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-118 B.19 SMX INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-119


APPENDIX C

INTEL® C/C++ COMPILER INTRINSICS AND FUNCTIONAL EQUIVALENTS

C.1 SIMPLE INTRINSICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2

C.2 COMPOSITE INTRINSICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-14

FIGURES

Figure 1-1. Bit and Byte Order 1-5

Figure 1-2. Syntax for CPUID, CR, and MSR Data Presentation 1-7

Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format 2-1

Figure 2-2. Table Interpretation of ModR/M Byte (C8H) 2-4

Figure 2-3. Prefix Ordering in 64-bit Mode 2-8

Figure 2-4. Memory Addressing Without an SIB Byte; REX.X Not Used 2-9

Figure 2-5. Register-Register Addressing (No Memory Operand); REX.X Not Used 2-9

Figure 2-6. Memory Addressing With a SIB Byte 2-10

Figure 2-7. Register Operand Coded in Opcode Byte; REX.X & REX.R Not Used 2-10

Figure 2-8. Instruction Encoding Format with VEX Prefix 2-13

Figure 2-9. VEX bit fields 2-15

Figure 2-10. AVX-512 Instruction Format and the EVEX Prefix 2-36

Figure 2-11. Bit Field Layout of the EVEX Prefix 2-36

Figure 3-1. Bit Offset for BIT[RAX, 21] 3-12

Figure 3-2. Memory Bit Indexing 3-13

Figure 3-3. ADDSUBPD—Packed Double-FP Add/Subtract 3-45

Figure 3-4. ADDSUBPS—Packed Single-FP Add/Subtract 3-47

Figure 3-5. Memory Layout of BNDMOV to/from Memory 3-101

Figure 3-6. Version Information Returned by CPUID in EAX 3-207

Figure 3-7. Feature Information Returned in the ECX Register 3-209

Figure 3-8. Feature Information Returned in the EDX Register 3-211

Figure 3-9. Determination of Support for the Processor Brand String 3-220

Figure 3-10. Algorithm for Extracting Processor Frequency 3-221

Figure 3-11. CVTDQ2PD (VEX.256 encoded version) 3-232

Figure 3-12. VCVTPD2DQ (VEX.256 encoded version) 3-239

Figure 3-13. VCVTPD2PS (VEX.256 encoded version) 3-244

Figure 3-14. CVTPS2PD (VEX.256 encoded version) 3-253

Figure 3-15. VCVTTPD2DQ (VEX.256 encoded version) 3-269

Figure 3-16. HADDPD—Packed Double-FP Horizontal Add 3-430

Figure 3-17. VHADDPD operation 3-431

Figure 3-18. HADDPS—Packed Single-FP Horizontal Add 3-434

Figure 3-19. VHADDPS operation 3-434

Figure 3-20. HSUBPD—Packed Double-FP Horizontal Subtract 3-437

Figure 3-21. VHSUBPD operation 3-438

Figure 3-22. HSUBPS—Packed Single-FP Horizontal Subtract 3-441

Figure 3-23. VHSUBPS operation 3-441

Figure 3-24. INVPCID Descriptor 3-477

Figure 4-1. Operation of PCMPSTRx and PCMPESTRx 4-6

Figure 4-2. VMOVDDUP Operation 4-60

Figure 4-3. MOVSHDUP Operation 4-114

Figure 4-4. MOVSLDUP Operation 4-117

Figure 4-5. 256-bit VMPSADBW Operation 4-137

Figure 4-6. Operation of the PACKSSDW Instruction Using 64-bit Operands 4-187

Figure 4-7. 256-bit VPALIGN Instruction Operation 4-220

Figure 4-8. PDEP Example 4-270

Figure 4-9. PEXT Example 4-272

Figure 4-10. 256-bit VPHADDD Instruction Operation 4-281

Figure 4-11. PMADDWD Execution Model Using 64-bit Operands 4-302

Figure 4-12. PMULHUW and PMULHW Instruction Operation Using 64-bit Operands 4-366

Figure 4-13. PMULLU Instruction Operation Using 64-bit Operands 4-378

Figure 4-14. PSADBW Instruction Operation Using 64-bit Operands 4-405

Figure 4-15. PSHUFB with 64-Bit Operands 4-410

Figure 4-16. 256-bit VPSHUFD Instruction Operation 4-413

Figure 4-17. PSLLW, PSLLD, and PSLLQ Instruction Operation Using 64-bit Operand 4-431

Figure 4-18. PSRAW and PSRAD Instruction Operation Using a 64-bit Operand 4-443

Figure 4-19. PSRLW, PSRLD, and PSRLQ Instruction Operation Using 64-bit Operand 4-455

Figure 4-20. PUNPCKHBW Instruction Operation Using 64-bit Operands 4-489

Figure 4-21. 256-bit VPUNPCKHDQ Instruction Operation 4-489

Figure 4-22. PUNPCKLBW Instruction Operation Using 64-bit Operands 4-499

Figure 4-23. 256-bit VPUNPCKLDQ Instruction Operation 4-499

Figure 4-24. Bit Control Fields of Immediate Byte for ROUNDxx Instruction 4-561

Figure 4-25. 256-bit VSHUFPD Operation of Four Pairs of DP FP Values 4-614

Figure 4-26. 256-bit VSHUFPS Operation of Selection from Input Quadruplet and Pair-wise Interleaved Result 4-619

Figure 4-27. VUNPCKHPS Operation 4-689

Figure 4-28. VUNPCKLPS Operation 4-697

Figure 5-1. VBROADCASTSS Operation (VEX.256 encoded version) 5-14

Figure 5-2. VBROADCASTSS Operation (VEX.128-bit version) 5-14

Figure 5-3. VBROADCASTSD Operation (VEX.256-bit version) 5-14

Figure 5-4. VBROADCASTF128 Operation (VEX.256-bit version) 5-14

Figure 5-5. VBROADCASTF64X4 Operation (512-bit version with writemask all 1s) 5-15

Figure 5-6. VCVTPH2PS (128-bit Version) 5-34

Figure 5-7. VCVTPS2PH (128-bit Version) 5-36

Figure 5-8. 64-bit Super Block of SAD Operation in VDBPSADBW 5-85

Figure 5-9. VFIXUPIMMPD Immediate Control Description 5-109

Figure 5-10. VFIXUPIMMPS Immediate Control Description 5-113

Figure 5-11. VFIXUPIMMSD Immediate Control Description 5-116

Figure 5-12. VFIXUPIMMSS Immediate Control Description 5-119

Figure 5-13. Imm8 Byte Specifier of Special Case FP Values for VFPCLASSPD/SD/PS/SS 5-236

Figure 5-14. VGETEXPPS Functionality On Normal Input values 5-265

Figure 5-15. Imm8 Controls for VGETMANTPD/SD/PS/SS 5-272

Figure 5-16. VPBROADCASTD Operation (VEX.256 encoded version) 5-306

Figure 5-17. VPBROADCASTD Operation (128-bit version) 5-306

Figure 5-18. VPBROADCASTQ Operation (256-bit version) 5-306

Figure 5-19. VBROADCASTI128 Operation (256-bit version) 5-307

Figure 5-20. VBROADCASTI256 Operation (512-bit version) 5-307

Figure 5-21. VPERM2F128 Operation 5-334

Figure 5-22. VPERM2I128 Operation 5-336

Figure 5-23. VPERMILPD Operation 5-352

Figure 5-24. VPERMILPD Shuffle Control 5-352

Figure 5-25. VPERMILPS Operation 5-357

Figure 5-26. VPERMILPS Shuffle Control 5-357

Figure 5-27. Imm8 Controls for VRANGEPD/SD/PS/SS 5-476

Figure 5-28. Imm8 Controls for VREDUCEPD/SD/PS/SS 5-499

Figure 5-29. Imm8 Controls for VRNDSCALEPD/SD/PS/SS 5-509

Figure 7-1. Register Source-Block Dot Product of Two Signed Word Operands with Doubleword Accumulation 6-18

Figure A-1. ModR/M Byte nnn Field (Bits 5, 4, and 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-17

Figure B-1. General Machine Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B-1

Figure B-2. Hybrid Notation of VEX-Encoded Key Instruction Bytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-74

TABLES

Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte 2-5

Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte 2-6

Table 2-3. 32-Bit Addressing Forms with the SIB Byte 2-7

Table 2-4. REX Prefix Fields [BITS: 0100WRXB] 2-9

Table 2-6. Direct Memory Offset Form of MOV 2-11

Table 2-5. Special Cases of REX Encodings 2-11

Table 2-7. RIP-Relative Addressing 2-12

Table 2-8. VEX.vvvv to register name mapping 2-17

Table 2-9. Instructions with a VEX.vvvv destination 2-17

Table 2-10. VEX.m-mmmm interpretation 2-18

Table 2-11. VEX.L interpretation 2-18

Table 2-12. VEX.pp interpretation 2-19

Table 2-13. 32-Bit VSIB Addressing Forms of the SIB Byte 2-20

Table 2-14. Exception class description 2-22

Table 2-15. Instructions in each Exception Class 2-23

Table 2-16. #UD Exception and VEX.W=1 Encoding 2-24

Table 2-17. #UD Exception and VEX.L Field Encoding 2-25

Table 2-18. Type 1 Class Exception Conditions 2-26

Table 2-19. Type 2 Class Exception Conditions 2-27

Table 2-20. Type 3 Class Exception Conditions 2-28

Table 2-21. Type 4 Class Exception Conditions 2-29

Table 2-22. Type 5 Class Exception Conditions 2-30

Table 2-23. Type 6 Class Exception Conditions 2-31

Table 2-24. Type 7 Class Exception Conditions 2-32

Table 2-25. Type 8 Class Exception Conditions 2-32

Table 2-26. Type 11 Class Exception Conditions 2-33

Table 2-27. Type 12 Class Exception Conditions 2-34

Table 2-28. VEX-Encoded GPR Instructions 2-35

Table 2-29. Exception Definition (VEX-Encoded GPR Instructions) 2-35

Table 2-30. EVEX Prefix Bit Field Functional Grouping 2-37

Table 2-31. 32-Register Support in 64-bit Mode Using EVEX with Embedded REX Bits 2-38

Table 2-32. EVEX Encoding Register Specifiers in 32-bit Mode 2-38

Table 2-33. Opmask Register Specifier Encoding 2-39

Table 2-34. Compressed Displacement (DISP8*N) Affected by Embedded Broadcast 2-40

Table 2-35. EVEX DISP8*N for Instructions Not Affected by Embedded Broadcast 2-40

Table 2-36. EVEX Embedded Broadcast/Rounding/SAE and Vector Length on Vector Instructions 2-42

Table 2-37. OS XSAVE Enabling Requirements of Instruction Categories 2-42

Table 2-38. Opcode Independent, State Dependent EVEX Bit Fields 2-42

Table 2-39. #UD Conditions of Operand-Encoding EVEX Prefix Bit Fields 2-43

Table 2-40. #UD Conditions of Opmask Related Encoding Field 2-43

Table 2-41. #UD Conditions Dependent on EVEX.b Context 2-44

Table 2-42. EVEX-Encoded Instruction Exception Class Summary 2-44

Table 2-43. EVEX Instructions in each Exception Class 2-45

Table 2-44. Type E1 Class Exception Conditions 2-48

Table 2-45. Type E1NF Class Exception Conditions 2-49

Table 2-46. Type E2 Class Exception Conditions 2-50

Table 2-47. Type E3 Class Exception Conditions 2-51

Table 2-48. Type E3NF Class Exception Conditions 2-52

Table 2-49. Type E4 Class Exception Conditions 2-53

Table 2-50. Type E4NF Class Exception Conditions 2-54

Table 2-51. Type E5 Class Exception Conditions 2-55

Table 2-52. Type E5NF Class Exception Conditions 2-56

Table 2-53. Type E6 Class Exception Conditions 2-57

Table 2-54. Type E6NF Class Exception Conditions 2-58

Table 2-55. Type E7NM Class Exception Conditions 2-59

Table 2-56. Type E9 Class Exception Conditions 2-60

Table 2-57. Type E9NF Class Exception Conditions 2-61

Table 2-58. Type E10 Class Exception Conditions 2-62

Table 2-59. Type E10NF Class Exception Conditions 2-63

Table 2-60. Type E11 Class Exception Conditions 2-64

Table 2-61. Type E12 Class Exception Conditions 2-65

Table 2-62. Type E12NP Class Exception Conditions 2-66

Table 2-63. TYPE K20 Exception Definition (VEX-Encoded OpMask Instructions w/o Memory Arg) 2-67

Table 2-64. TYPE K21 Exception Definition (VEX-Encoded OpMask Instructions Addressing Memory) 2-68

Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro 3-2

Table 3-2. Range of Bit Positions Specified by Bit Offset Operands 3-13

Table 3-3. Standard and Non-standard Data Types 3-15

Table 3-4. Intel 64 and IA-32 General Exceptions 3-16

Table 3-5. x87 FPU Floating-Point Exceptions 3-17

Table 3-6. SIMD Floating-Point Exceptions 3-17

Table 3-7. Decision Table for CLI Results 3-144

Table 3-1. Comparison Predicate for CMPPD and CMPPS Instructions 3-157

Table 3-2. Pseudo-Op and CMPPD Implementation 3-158

Table 3-3. Pseudo-Op and VCMPPD Implementation 3-159

Table 3-4. Pseudo-Op and CMPPS Implementation 3-164

Table 3-5. Pseudo-Op and VCMPPS Implementation 3-165

Table 3-6. Pseudo-Op and CMPSD Implementation 3-175

Table 3-7. Pseudo-Op and VCMPSD Implementation 3-175

Table 3-8. Pseudo-Op and CMPSS Implementation 3-179

Table 3-9. Pseudo-Op and VCMPSS Implementation 3-179

Table 3-8. Information Returned by CPUID Instruction 3-192

Table 3-9. Processor Type Field 3-207

Table 3-10. Feature Information Returned in the ECX Register 3-209

Table 3-11. More on Feature Information Returned in the EDX Register 3-212

Table 3-12. Encoding of CPUID Leaf 2 Descriptors 3-214

Table 3-13. Processor Brand String Returned with Pentium 4 Processor 3-221

Table 3-14. Mapping of Brand Indices; and Intel 64 and IA-32 Processor Brand Strings 3-222

Table 3-15. DIV Action 3-288

Table 3-16. Results Obtained from F2XM1 3-312

Table 3-17. Results Obtained from FABS 3-314

Table 3-18. FADD/FADDP/FIADD Results 3-316

Table 3-19. FBSTP Results 3-320

Table 3-20. FCHS Results 3-322

Table 3-21. FCOM/FCOMP/FCOMPP Results 3-328

Table 3-22. FCOMI/FCOMIP/ FUCOMI/FUCOMIP Results 3-331

Table 3-23. FCOS Results 3-334

Table 3-24. FDIV/FDIVP/FIDIV Results 3-338

Table 3-25. FDIVR/FDIVRP/FIDIVR Results 3-341

Table 3-26. FICOM/FICOMP Results 3-344

Table 3-27. FIST/FISTP Results 3-351

Table 3-28. FISTTP Results 3-354

Table 3-29. FMUL/FMULP/FIMUL Results 3-365

Table 3-30. FPATAN Results 3-368

Table 3-31. FPREM Results 3-370

Table 3-32. FPREM1 Results 3-372

Table 3-33. FPTAN Results 3-374

Table 3-34. FSCALE Results 3-382

Table 3-35. FSIN Results 3-384

Table 3-36. FSINCOS Results 3-386

Table 3-37. FSQRT Results 3-388

Table 3-38. FSUB/FSUBP/FISUB Results 3-399

Table 3-39. FSUBR/FSUBRP/FISUBR Results 3-402

Table 3-40. FTST Results 3-404

Table 3-41. FUCOM/FUCOMP/FUCOMPP Results 3-406

Table 3-42. FXAM Results 3-409

Table 3-43. Non-64-bit-Mode Layout of FXSAVE and FXRSTOR Memory Region 3-416

Table 3-44. Field Definitions 3-417

Table 3-45. Recreating FSAVE Format 3-419

Table 3-46. Layout of the 64-bit-mode FXSAVE64 Map (requires REX.W = 1) 3-420

Table 3-47. Layout of the 64-bit-mode FXSAVE Map (REX.W = 0) 3-421

Table 3-48. FYL2X Results 3-426

Table 3-49. FYL2XP1 Results 3-428

Table 3-50. IDIV Results 3-443

Table 3-51. Decision Table 3-461

Table 3-52. Segment and Gate Types 3-520

Table 3-53. Non-64-bit Mode LEA Operation with Address and Operand Size Attributes 3-529

Table 3-54. 64-bit Mode LEA Operation with Address and Operand Size Attributes 3-529

Table 3-55. Segment and Gate Descriptor Types 3-549

Table 4-1. Source Data Format 4-2

Table 4-2. Aggregation Operation 4-2

Table 4-3. Aggregation Operation 4-3

Table 4-4. Polarity 4-3

Table 4-5. Output Selection 4-4

Table 4-6. Output Selection 4-4

Table 4-7. Comparison Result for Each Element Pair BoolRes[i.j] 4-4

Table 4-8. Summary of Imm8 Control Byte 4-5

Table 4-9. MUL Results. 4-144

Table 4-10. MWAIT Extension Register (ECX) 4-159

Table 4-11. MWAIT Hints Register (EAX) 4-159

Table 4-12. Recommended Multi-Byte Sequence of NOP Instruction 4-163

Table 4-13. PCLMULQDQ Quadword Selection of Immediate Byte 4-241

Table 4-14. Pseudo-Op and PCLMULQDQ Implementation 4-241

Table 4-15. Effect of POPF/POPFD on the EFLAGS Register 4-394

Table 4-16. Valid General and Special Purpose Performance Counter Index Range for RDPMC 4-533

Table 4-17. Repeat Prefixes 4-546

Table 4-18. Rounding Modes and Encoding of Rounding Control (RC) Field 4-561

Table 4-19. Decision Table for STI Results 4-641

Table 5-1. Low 8 columns of the 16x16 Map of VPTERNLOG Boolean Logic Operations 5-2

Table 5-2. Low 8 columns of the 16x16 Map of VPTERNLOG Boolean Logic Operations 5-3

Table 5-3. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions 5-37

Table 5-4. Classifier Operations for VFPCLASSPD/SD/PS/SS 5-236

Table 5-5. VGETEXPPD/SD Special Cases 5-261

Table 5-6. VGETEXPPS/SS Special Cases 5-264

Table 5-7. GetMant() Special Float Values Behavior 5-273

Table 5-8. Pseudo-Op and VPCMP* Implementation 5-316

Table 5-9. Examples of VPTERNLOGD/Q Imm8 Boolean Function and Input Index Values 5-467

Table 5-10. Signaling of Comparison Operation of One or More NaN Input Values and Effect of Imm8[3:2] 5-477

Table 5-11. Comparison Result for Opposite-Signed Zero Cases for MIN, MIN_ABS and MAX, MAX_ABS 5-477

Table 5-12. Comparison Result of Equal-Magnitude Input Cases for MIN_ABS and MAX_ABS, (|a| = |b|, a>0, b<0) 5-477

Table 5-13. VRCP14PD/VRCP14SD Special Cases 5-491

Table 5-14. VRCP14PS/VRCP14SS Special Cases 5-495

Table 5-15. VREDUCEPD/SD/PS/SS Special Cases 5-500

Table 5-16. VRNDSCALEPD/SD/PS/SS Special Cases 5-509

Table 5-17. VRSQRT14PD Special Cases 5-520

Table 5-18. VRSQRT14SD Special Cases 5-522

Table 5-19. VRSQRT14PS Special Cases 5-524

Table 5-20. VRSQRT14SS Special Cases 5-526

Table 5-21. \VSCALEFPD/SD/PS/SS Special Cases 5-527

Table 5-22. Additional VSCALEFPD/SD Special Cases 5-528

Table 5-23. Additional VSCALEFPS/SS Special Cases 5-532

Table 6-1. Layout of IA32_FEATURE_CONTROL 6-2

Table 6-2. GETSEC Leaf Functions 6-3

Table 6-3. Getsec Capability Result Encoding (EBX = 0) 6-7

Table 6-4. Register State Initialization after GETSEC[ENTERACCS] 6-12

Table 6-5. IA32_MISC_ENABLE MSR Initialization by ENTERACCS and SENTER 6-13

Table 6-6. Register State Initialization after GETSEC[SENTER] and GETSEC[WAKEUP] 6-24

Table 6-7. SMX Reporting Parameters Format 6-33

Table 6-8. TXT Feature Extensions Flags 6-34

Table 6-9. External Memory Types Using Parameter 3 6-35

Table 6-10. Default Parameter Values 6-35

Table 6-11. Supported Actions for GETSEC[SMCTRL(0)] 6-37

Table 6-12. RLP MVMM JOIN Data Structure 6-40

Table 6-1. Special Values Behavior 6-9

Table 6-2. Special Values Behavior 6-11

Table 6-3. VRCP28PD Special Cases 6-21

Table 6-4. VRCP28SD Special Cases 6-23

Table 6-5. VRCP28PS Special Cases 6-25

Table 6-6. VRCP28SS Special Cases 6-27

Table 6-7. VRSQRT28PD Special Cases 6-29

Table 6-8. VRSQRT28SD Special Cases 6-31

Table 6-9. VRSQRT28PS Special Cases 6-33

Table 6-10. VRSQRT28SS Special Cases 6-35

Table A-1. Superscripts Utilized in Opcode Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6

Table A-2. One-byte Opcode Map: (00H — F7H) * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7

Table A-3. Two-byte Opcode Map: 00H — 77H (First Byte is 0FH) * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9

Table A-4. Three-byte Opcode Map: 00H — F7H (First Two Bytes are 0F 38H) * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-13

Table A-5. Three-byte Opcode Map: 00H — F7H (First two bytes are 0F 3AH) * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-15

Table A-6. Opcode Extensions for One- and Two-byte Opcodes by Group Number * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-18

Table A-7. D8 Opcode Map When ModR/M Byte is Within 00H to BFH *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-20

Table A-8. D8 Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-21

Table A-9. D9 Opcode Map When ModR/M Byte is Within 00H to BFH *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-21

Table A-10. D9 Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-22

Table A-11. DA Opcode Map When ModR/M Byte is Within 00H to BFH *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-22

Table A-12. DA Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-23

Table A-13. DB Opcode Map When ModR/M Byte is Within 00H to BFH *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-23

Table A-14. DB Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-24

Table A-15. DC Opcode Map When ModR/M Byte is Within 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-24

Table A-16. DC Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-25

Table A-17. DD Opcode Map When ModR/M Byte is Within 00H to BFH *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-25

Table A-18. DD Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-26

Table A-19. DE Opcode Map When ModR/M Byte is Within 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-26

Table A-20. DE Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-27

Table A-21. DF Opcode Map When ModR/M Byte is Within 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-27

Table A-22. DF Opcode Map When ModR/M Byte is Outside 00H to BFH * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-28

Table B-1. Special Fields Within Instruction Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2

Table B-2. Encoding of reg Field When w Field is Not Present in Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

Table B-3. Encoding of reg Field When w Field is Present in Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

Table B-4. Encoding of reg Field When w Field is Not Present in Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

Table B-5. Encoding of reg Field When w Field is Present in Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4 Table B-6. Encoding of Operand Size (w) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

Table B-7. Encoding of Sign-Extend (s) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5

Table B-8. Encoding of the Segment Register (sreg) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5

Table B-9. Encoding of Special-Purpose Register (eee) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5

Table B-10. Encoding of Conditional Test (tttn) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6

Table B-11. Encoding of Operation Direction (d) Bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6

Table B-13. General Purpose Instruction Formats and Encodings for Non-64-Bit Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7 Table B-12. Notes on Instruction Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7

Table B-14. Special Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18

Table B-15. General Purpose Instruction Formats and Encodings for 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18

Table B-16. Pentium Processor Family Instruction Formats and Encodings, Non-64-Bit Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-38

Table B-17. Pentium Processor Family Instruction Formats and Encodings, 64-Bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-38 Table B-18. Encoding of Granularity of Data Field (gg) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-39

Table B-19. MMX Instruction Formats and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-39

Table B-20. Formats and Encodings of XSAVE/XRSTOR/XGETBV/XSETBV Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-42

Table B-21. Formats and Encodings of P6 Family Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-42

Table B-22. Formats and Encodings of SSE Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-43

Table B-23. Formats and Encodings of SSE Integer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-48

Table B-25. Encoding of Granularity of Data Field (gg) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-49

Table B-24. Format and Encoding of SSE Cacheability & Memory Ordering Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-49

Table B-26. Formats and Encodings of SSE2 Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-50

Table B-27. Formats and Encodings of SSE2 Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-55

Table B-28. Format and Encoding of SSE2 Cacheability Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-59

Table B-29. Formats and Encodings of SSE3 Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-60

Table B-30. Formats and Encodings for SSE3 Event Management Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-60

Table B-31. Formats and Encodings for SSE3 Integer and Move Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-61

Table B-32. Formats and Encodings for SSSE3 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-61

Table B-33. Formats and Encodings of AESNI and PCLMULQDQ Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-64

Table B-34. Special Case Instructions Promoted Using REX.W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-65

Table B-35. Encodings of SSE4.1 instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-67

Table B-36. Encodings of SSE4.2 instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-73

Table B-37. Encodings of AVX instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-74

Table B-38. General Floating-Point Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-114

Table B-39. Floating-Point Instruction Formats and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-114

Table B-40. Encodings for VMX Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-118

Table B-41. Encodings for SMX Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-119

Table C-1. Simple Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2

Table C-2. Composite Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-14

CHAPTER 1 ABOUT THIS MANUAL

image


The Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volumes 2A, 2B, 2C & 2D: Instruction Set Reference (order numbers 253666, 253667, 326018 and 334569) are part of a set that describes the architecture and programming environment of all Intel 64 and IA-32 architecture processors. Other volumes in this set are:

The VEX prefix is required to be the last prefix and immediately precedes the opcode bytes. It must follow any other prefixes. If VEX prefix is present a REX prefix is not supported.

The 3-byte VEX leaves room for future expansion with 3 reserved bits. REX and the 66h/F2h/F3h prefixes are reclaimed for future use.

VEX prefix has a two-byte form and a three byte form. If an instruction syntax can be encoded using the two-byte form, it can also be encoded using the three byte form of VEX. The latter increases the length of the instruction by one byte. This may be helpful in some situations for code alignment.

The VEX prefix supports 256-bit versions of floating-point SSE, SSE2, SSE3, and SSE4 instructions. Note, certain new instruction functionality can only be encoded with the VEX prefix.

The VEX prefix will #UD on any instruction containing MMX register sources or destinations.



(Bit Position) 7

Byte 0


0



7 6 5

Byte 1

4


0



7


6

Byte 2

3


2


1


0

3-byte VEX


11000100



R X B


m-mmmm



W



vvvv


L


pp

1

1

image

image

image

7 0 7 6 3 2 1 0


2-byte VEX



11000101


R


vvvv


L


pp

R: REX.R in 1’s complement (inverted) form

1: Same as REX.R=0 (must be 1 in 32-bit mode) 0: Same as REX.R=1 (64-bit mode only)

X: REX.X in 1’s complement (inverted) form

1: Same as REX.X=0 (must be 1 in 32-bit mode) 0: Same as REX.X=1 (64-bit mode only)

B: REX.B in 1’s complement (inverted) form

1: Same as REX.B=0 (Ignored in 32-bit mode). 0: Same as REX.B=1 (64-bit mode only)

W: opcode specific (use like REX.W, or used for opcode extension, or ignored, depending on the opcode byte)

    1. mmm:

      00000: Reserved for future use (will #UD) 00001: implied 0F leading opcode byte 00010: implied 0F 38 leading opcode bytes 00011: implied 0F 3A leading opcode bytes

      00100-11111: Reserved for future use (will #UD)


      vvvv: a register specifier (in 1’s complement form) or 1111 if unused.

      L: Vector Length

      0: scalar or 128-bit vector 1: 256-bit vector


      pp: opcode extension providing equivalent functionality of a SIMD prefix 00: None

      01: 66

      10: F3

      11: F2


      Figure 2-9. VEX bit fields


      The following subsections describe the various fields in two or three-byte VEX prefix.


            1. VEX Byte 0, bits[7:0]

              VEX Byte 0, bits [7:0] must contain the value 11000101b (C5h) or 11000100b (C4h). The 3-byte VEX uses the C4h first byte, while the 2-byte VEX uses the C5h first byte.


            2. VEX Byte 1, bit [7] - ‘R’

              VEX Byte 1, bit [7] contains a bit analogous to a bit inverted REX.R. In protected and compatibility modes the bit must be set to ‘1’ otherwise the instruction is LES or LDS.



              This bit is present in both 2- and 3-byte VEX prefixes.

              The usage of WRXB bits for legacy instructions is explained in detail section 2.2.1.2 of Intel 64 and IA-32 Architec- tures Software developer’s manual, Volume 2A.

              This bit is stored in bit inverted format.


            3. 3-byte VEX byte 1, bit[6] - ‘X’

              Bit[6] of the 3-byte VEX byte 1 encodes a bit analogous to a bit inverted REX.X. It is an extension of the SIB Index field in 64-bit modes. In 32-bit modes, this bit must be set to ‘1’ otherwise the instruction is LES or LDS.

              This bit is available only in the 3-byte VEX prefix. This bit is stored in bit inverted format.


            4. 3-byte VEX byte 1, bit[5] - ‘B’

              Bit[5] of the 3-byte VEX byte 1 encodes a bit analogous to a bit inverted REX.B. In 64-bit modes, it is an extension of the ModR/M r/m field, or the SIB base field. In 32-bit modes, this bit is ignored.

              This bit is available only in the 3-byte VEX prefix. This bit is stored in bit inverted format.


            5. 3-byte VEX byte 2, bit[7] - ‘W’

              Bit[7] of the 3-byte VEX byte 2 is represented by the notation VEX.W. It can provide following functions, depending on the specific opcode.

              • For AVX instructions that have equivalent legacy SSE instructions (typically these SSE instructions have a general-purpose register operand with its operand size attribute promotable by REX.W), if REX.W promotes the operand size attribute of the general-purpose register operand in legacy SSE instruction, VEX.W has same meaning in the corresponding AVX equivalent form. In 32-bit modes for these instructions, VEX.W is silently ignored.

              • For AVX instructions that have equivalent legacy SSE instructions (typically these SSE instructions have oper- ands with their operand size attribute fixed and not promotable by REX.W), if REX.W is don’t care in legacy SSE instruction, VEX.W is ignored in the corresponding AVX equivalent form irrespective of mode.

              • For new AVX instructions where VEX.W has no defined function (typically these meant the combination of the opcode byte and VEX.mmmmm did not have any equivalent SSE functions), VEX.W is reserved as zero and setting to other than zero will cause instruction to #UD.


        1. 2-byte VEX Byte 1, bits[6:3] and 3-byte VEX Byte 2, bits [6:3]- ‘vvvv’ the Source or Dest Register Specifier

          In 32-bit mode the VEX first byte C4 and C5 alias onto the LES and LDS instructions. To maintain compatibility with existing programs the VEX 2nd byte, bits [7:6] must be 11b. To achieve this, the VEX payload bits are selected to place only inverted, 64-bit valid fields (extended register selectors) in these upper bits.

          The 2-byte VEX Byte 1, bits [6:3] and the 3-byte VEX, Byte 2, bits [6:3] encode a field (shorthand VEX.vvvv) that for instructions with 2 or more source registers and an XMM or YMM or memory destination encodes the first source register specifier stored in inverted (1’s complement) form.

          VEX.vvvv is not used by the instructions with one source (except certain shifts, see below) or on instructions with no XMM or YMM or memory destination. If an instruction does not use VEX.vvvv then it should be set to 1111b otherwise instruction will #UD.

          In 64-bit mode all 4 bits may be used. See Table 2-8 for the encoding of the XMM or YMM registers. In 32-bit and 16-bit modes bit 6 must be 1 (if bit 6 is not 1, the 2-byte VEX version will generate LDS instruction and the 3-byte VEX version will ignore this bit).



          Table 2-8. VEX.vvvv to register name mapping

          VEX.vvvv

          Dest Register

          Valid in Legacy/Compatibility 32-bit modes?

          1111B

          XMM0/YMM0

          Valid

          1110B

          XMM1/YMM1

          Valid

          1101B

          XMM2/YMM2

          Valid

          1100B

          XMM3/YMM3

          Valid

          1011B

          XMM4/YMM4

          Valid

          1010B

          XMM5/YMM5

          Valid

          1001B

          XMM6/YMM6

          Valid

          1000B

          XMM7/YMM7

          Valid

          0111B

          XMM8/YMM8

          Invalid

          0110B

          XMM9/YMM9

          Invalid

          0101B

          XMM10/YMM10

          Invalid

          0100B

          XMM11/YMM11

          Invalid

          0011B

          XMM12/YMM12

          Invalid

          0010B

          XMM13/YMM13

          Invalid

          0001B

          XMM14/YMM14

          Invalid

          0000B

          XMM15/YMM15

          Invalid


          The VEX.vvvv field is encoded in bit inverted format for accessing a register operand.


      1. Instruction Operand Encoding and VEX.vvvv, ModR/M

        VEX-encoded instructions support three-operand and four-operand instruction syntax. Some VEX-encoded instructions have syntax with less than three operands, e.g. VEX-encoded pack shift instructions support one source operand and one destination operand).

        The roles of VEX.vvvv, reg field of ModR/M byte (ModR/M.reg), r/m field of ModR/M byte (ModR/M.r/m) with respect to encoding destination and source operands vary with different type of instruction syntax.

        The role of VEX.vvvv can be summarized to three situations:

        • VEX.vvvv encodes the first source register operand, specified in inverted (1’s complement) form and is valid for instructions with 2 or more source operands.

        • VEX.vvvv encodes the destination register operand, specified in 1’s complement form for certain vector shifts. The instructions where VEX.vvvv is used as a destination are listed in Table 2-9. The notation in the “Opcode” column in Table 2-9 is described in detail in section 3.1.1.

        • VEX.vvvv does not encode any operand, the field is reserved and should contain 1111b.

          Table 2-9. Instructions with a VEX.vvvv destination

          Opcode

          Instruction mnemonic

          VEX.NDD.128.66.0F 73 /7 ib

          VPSLLDQ xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 73 /3 ib

          VPSRLDQ xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 71 /2 ib

          VPSRLW xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 72 /2 ib

          VPSRLD xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 73 /2 ib

          VPSRLQ xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 71 /4 ib

          VPSRAW xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 72 /4 ib

          VPSRAD xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 71 /6 ib

          VPSLLW xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 72 /6 ib

          VPSLLD xmm1, xmm2, imm8

          VEX.NDD.128.66.0F 73 /6 ib

          VPSLLQ xmm1, xmm2, imm8



          The role of ModR/M.r/m field can be summarized to two situations:

        • ModR/M.r/m encodes the instruction operand that references a memory address.

        • For some instructions that do not support memory addressing semantics, ModR/M.r/m encodes either the destination register operand or a source register operand.

          The role of ModR/M.reg field can be summarized to two situations:

        • ModR/M.reg encodes either the destination register operand or a source register operand.

        • For some instructions, ModR/M.reg is treated as an opcode extension and not used to encode any instruction operand.

          For instruction syntax that support four operands, VEX.vvvv, ModR/M.r/m, ModR/M.reg encodes three of the four operands. The role of bits 7:4 of the immediate byte serves the following situation:

        • Imm8[7:4] encodes the third source register operand.


          1. 3-byte VEX byte 1, bits[4:0] - “m-mmmm”

            Bits[4:0] of the 3-byte VEX byte 1 encode an implied leading opcode byte (0F, 0F 38, or 0F 3A). Several bits are reserved for future use and will #UD unless 0.


            Table 2-10. VEX.m-mmmm interpretation

            VEX.m-mmmm

            Implied Leading Opcode Bytes

            00000B

            Reserved

            00001B

            0F

            00010B

            0F 38

            00011B

            0F 3A

            00100-11111B

            Reserved

            (2-byte VEX)

            0F


            VEX.m-mmmm is only available on the 3-byte VEX. The 2-byte VEX implies a leading 0Fh opcode byte.


          2. 2-byte VEX byte 1, bit[2], and 3-byte VEX byte 2, bit [2]- “L”

            The vector length field, VEX.L, is encoded in bit[2] of either the second byte of 2-byte VEX, or the third byte of 3- byte VEX. If “VEX.L = 1”, it indicates 256-bit vector operation. “VEX.L = 0” indicates scalar and 128-bit vector operations.

            The instruction VZEROUPPER is a special case that is encoded with VEX.L = 0, although its operation zero’s bits 255:128 of all YMM registers accessible in the current operating mode.

            See the following table.


            Table 2-11. VEX.L interpretation

            VEX.L

            Vector Length

            0

            128-bit (or 32/64-bit scalar)

            1

            256-bit


          3. 2-byte VEX byte 1, bits[1:0], and 3-byte VEX byte 2, bits [1:0]- “pp”

            Up to one implied prefix is encoded by bits[1:0] of either the 2-byte VEX byte 1 or the 3-byte VEX byte 2. The prefix behaves as if it was encoded prior to VEX, but after all other encoded prefixes.

            See the following table.



            Table 2-12. VEX.pp interpretation

            pp

            Implies this prefix after other prefixes but before VEX

            00B

            None

            01B

            66

            10B

            F3

            11B

            F2


      2. The Opcode Byte

        One (and only one) opcode byte follows the 2 or 3 byte VEX. Legal opcodes are specified in Appendix B, in color. Any instruction that uses illegal opcode will #UD.


      3. The MODRM, SIB, and Displacement Bytes

        The encodings are unchanged but the interpretation of reg_field or rm_field differs (see above).


      4. The Third Source Operand (Immediate Byte)

        VEX-encoded instructions can support instruction with a four operand syntax. VBLENDVPD, VBLENDVPS, and PBLENDVB use imm8[7:4] to encode one of the source registers.


      5. AVX Instructions and the Upper 128-bits of YMM registers

        If an instruction with a destination XMM register is encoded with a VEX prefix, the processor zeroes the upper bits (above bit 128) of the equivalent YMM register. Legacy SSE instructions without VEX preserve the upper bits.


        1. Vector Length Transition and Programming Considerations

An instruction encoded with a VEX.128 prefix that loads a YMM register operand operates as follows:


The Compatibility/Legacy Mode support is to the right of the ‘slash’ and has the following notation:


        1. CPUID Support Column in the Instruction Summary Table

          The fourth column holds abbreviated CPUID feature flags (e.g., appropriate bit in CPUID.1.ECX, CPUID.1.EDX for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AESNI/PCLMULQDQ/AVX/RDRAND support) that indicate processor support for the instruction. If the corresponding flag is ‘0’, the instruction will #UD.


        2. Description Column in the Instruction Summary Table

          The “Description” column briefly explains forms of the instruction.


        3. Description Section

          Each instruction is then described by number of information sections. The “Description” section describes the purpose of the instructions and required operands in more detail.

          Summary of terms that may be used in the description section:

          • Legacy SSE — Refers to SSE, SSE2, SSE3, SSSE3, SSE4, AESNI, PCLMULQDQ and any future instruction sets referencing XMM registers and encoded without a VEX prefix.

          • VEX.vvvv — The VEX bit field specifying a source or destination register (in 1’s complement form).

          • rm_field — shorthand for the ModR/M r/m field and any REX.B

          • reg_field — shorthand for the ModR/M reg field and any REX.R


        4. Operation Section

          The “Operation” section contains an algorithm description (frequently written in pseudo-code) for the instruction. Algorithms are composed of the following elements:

          • Comments are enclosed within the symbol pairs “(*” and “*)”.

          • Compound statements are enclosed in keywords, such as: IF, THEN, ELSE and FI for an if statement; DO and OD for a do statement; or CASE... OF for a case statement.

          • A register name implies the contents of the register. A register name enclosed in brackets implies the contents of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the location whose ES segment relative address is in register DI. [SI] indicates the contents of the address contained in register SI relative to the SI register’s default segment (DS) or the overridden segment.

          • Parentheses around the “E” in a general-purpose register name, such as (E)SI, indicates that the offset is read from the SI register if the address-size attribute is 16, from the ESI register if the address-size attribute is 32. Parentheses around the “R” in a general-purpose register name, (R)SI, in the presence of a 64-bit register definition such as (R)SI, indicates that the offset is read from the 64-bit RSI register if the address-size attribute is 64.

          • Brackets are used for memory operands where they mean that the contents of the memory location is a segment-relative offset. For example, [SRC] indicates that the content of the source operand is a segment- relative offset.

          • A B indicates that the value of B is assigned to A.

          • The symbols =, , , <, , and are relational operators used to compare two values: meaning equal, not equal, greater or equal, less or equal, respectively. A relational expression such as A B is TRUE if the value of A is equal to B; otherwise it is FALSE.


          • The expression “« COUNT” and “» COUNT” indicates that the destination operand should be shifted left or right by the number of bits indicated by the count operand.

            The following identifiers are used in the algorithmic descriptions:

          • OperandSize and AddressSize — The OperandSize identifier represents the operand-size attribute of the instruction, which is 16, 32 or 64-bits. The AddressSize identifier represents the address-size attribute, which is 16, 32 or 64-bits. For example, the following pseudo-code indicates that the operand-size attribute depends on the form of the MOV instruction used.

            IF Instruction MOVW

            THEN OperandSize 16; ELSE

            IF Instruction MOVD

            THEN OperandSize 32; ELSE

            IF Instruction MOVQ

            THEN OperandSize 64;


            FI;


            FI;

            FI;

            See “Operand-Size and Address-Size Attributes” in Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for guidelines on how these attributes are determined.

          • StackAddrSize — Represents the stack address-size attribute associated with the instruction, which has a value of 16, 32 or 64-bits. See “Address-Size Attribute for Stack” in Chapter 6, “Procedure Calls, Interrupts, and Exceptions,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.

          • SRC — Represents the source operand.

          • DEST — Represents the destination operand.

          • MAXVL — The maximum vector register width pertaining to the instruction. This is not the vector-length encoding in the instruction's encoding but is instead determined by the current value of XCR0. For details, refer to the table below. Note that the value of MAXVL is the largest of the features enabled. Future processors may define new bits in XCR0 whose setting may imply other values for MAXVL.


            MAXVL Definition

            XCR0 Component

            MAXVL

            XCR0.SSE

            128

            XCR0.AVX

            256

            XCR0.{ZMM_Hi256, Hi16_ZMM, OPMASK}

            512

            The following functions are used in the algorithmic descriptions:

          • ZeroExtend(value) — Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, zero extending a byte value of –10 converts the byte from F6H to a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the operand-size attribute are the same size, ZeroExtend returns the value unaltered.

          • SignExtend(value) — Returns a value sign-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, sign extending a byte containing the value –10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand- size attribute are the same size, SignExtend returns the value unaltered.

          • SaturateSignedWordToSignedByte — Converts a signed 16-bit value to a signed 8-bit value. If the signed 16-bit value is less than –128, it is represented by the saturated value -128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).


          • SaturateSignedDwordToSignedWord — Converts a signed 32-bit value to a signed 16-bit value. If the signed 32-bit value is less than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).

          • SaturateSignedWordToUnsignedByte — Converts a signed 16-bit value to an unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).

          • SaturateToSignedByte — Represents the result of an operation as a signed 8-bit value. If the result is less than –128, it is represented by the saturated value –128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).

          • SaturateToSignedWord — Represents the result of an operation as a signed 16-bit value. If the result is less than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).

          • SaturateToUnsignedByte — Represents the result of an operation as a signed 8-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).

          • SaturateToUnsignedWord — Represents the result of an operation as a signed 16-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535 (FFFFH).

          • LowOrderWord(DEST * SRC) — Multiplies a word operand by a word operand and stores the least significant word of the doubleword result in the destination operand.

          • HighOrderWord(DEST * SRC) — Multiplies a word operand by a word operand and stores the most significant word of the doubleword result in the destination operand.

          • Push(value) — Pushes a value onto the stack. The number of bytes pushed is determined by the operand-size attribute of the instruction. See the “Operation” subsection of the “PUSH—Push Word, Doubleword or Quadword Onto the Stack” section in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B.

          • Pop() — removes the value from the top of the stack and returns it. The statement EAX Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will return either a word, a doubleword or a quadword depending on the operand-size attribute. See the “Operation” subsection in the “POP—Pop a Value from the Stack” section of Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B.

          • PopRegisterStack — Marks the FPU ST(0) register as empty and increments the FPU register stack pointer (TOP) by 1.

          • Switch-Tasks — Performs a task switch.

            image

          • Bit(BitBase, BitOffset) — Returns the value of a bit within a bit string. The bit string is a sequence of bits in memory or a register. Bits are numbered from low-order to high-order within registers and within memory bytes. If the BitBase is a register, the BitOffset can be in the range 0 to [15, 31, 63] depending on the mode and register size. See Figure 3-1: the function Bit[RAX, 21] is illustrated.



            63

            31

            21

            0


            Bit Offset 21

            Figure 3-1. Bit Offset for BIT[RAX, 21]



            If BitBase is a memory address, the BitOffset has different ranges depending on the operand size (see Table 3-2).


            Table 3-2. Range of Bit Positions Specified by Bit Offset Operands

            Operand Size

            Immediate BitOffset

            Register BitOffset

            16

            0 to 15

            215 to 215 1

            32

            0 to 31

            231 to 231 1

            64

            0 to 63

            263 to 263 1

            The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase + (BitOffset DIV 8)) where DIV is signed division with rounding towards negative infinity and MOD returns a positive number (see

            Figure 3-2).


            image

            7 5 0 7 0 7 0







            BitBase  BitBase BitBase 


            BitOffset 13


            7 0 7 0 7 5 0







            BitBase

            BitBase

            BitBase


            BitOffset 11


            Figure 3-2. Memory Bit Indexing


        5. Intel® C/C Compiler Intrinsics Equivalents Section

          The Intel C/C compiler intrinsic functions give access to the full power of the Intel Architecture Instruction Set, while allowing the compiler to optimize register allocation and instruction scheduling for faster execution. Most of these functions are associated with a single IA instruction, although some may generate multiple instructions or different instructions depending upon how they are used. In particular, these functions are used to invoke instruc- tions that perform operations on vector registers that can hold multiple data elements. These SIMD instructions use the following data types.

          • m128, m256 and m512 can represent 4, 8 or 16 packed single-precision floating-point values, and are used with the vector registers and SSE, AVX, or AVX-512 instruction set extension families. The m128 data type is also used with various single-precision floating-point scalar instructions that perform calculations using only the lowest 32 bits of a vector register; the remaining bits of the result come from one of the sources or are set to zero depending upon the instruction.

          • m128d, m256d and m512d can represent 2, 4 or 8 packed double-precision floating-point values, and are used with the vector registers and SSE, AVX, or AVX-512 instruction set extension families. The m128d data type is also used with various double-precision floating-point scalar instructions that perform calculations using only the lowest 64 bits of a vector register; the remaining bits of the result come from one of the sources or are set to zero depending upon the instruction.

          • m128i, m256i and m512i can represent integer data in bytes, words, doublewords, quadwords, and occasionally larger data types.



            Each of these data types incorporates in its name the number of bits it can hold. For example, the m128 type holds 128 bits, and because each single-precision floating-point value is 32 bits long the m128 type holds (128/32) or four values. Normally the compiler will allocate memory for these data types on an even multiple of the size of the type. Such aligned memory locations may be faster to read and write than locations at other addresses.

            These SIMD data types are not basic Standard C data types or C objects, so they may be used only with the assignment operator, passed as function arguments, and returned from a function call. If you access the internal members of these types directly, or indirectly by using them in a union, there may be side effects affecting optimi- zation, so it is recommended to use them only with the SIMD instruction intrinsic functions described in this manual or the Intel C/C compiler documentation.

            Many intrinsic functions names are prefixed with an indicator of the vector length and suffixed by an indicator of the vector element data type, although some functions do not follow the rules below. The prefixes are:

          • _mm_ indicates that the function operates on 128-bit (or sometimes 64-bit) vectors.

          • _mm256_ indicates the function operates on 256-bit vectors.

          • _mm512_ indicates that the function operates on 512-bit vectors. The suffixes include:

          • _ps, which indicates a function that operates on packed single-precision floating-point data. Packed single-

            precision floating-point data corresponds to arrays of the C/C type float with either 4, 8 or 16 elements. Values of this type can be loaded from an array using the _mm_loadu_ps, _mm256_loadu_ps, or

            _mm512_loadu_ps functions, or created from individual values using _mm_set_ps, _mm256_set_ps, or

            _mm512_set_ps functions, and they can be stored in an array using _mm_storeu_ps, _mm256_storeu_ps, or

            _mm512_storeu_ps.

          • _ss, which indicates a function that operates on scalar single-precision floating-point data. Single-precision floating-point data corresponds to the C/C type float, and values of type float can be converted to type

            m128 for use with these functions using the _mm_set_ss function, and converted back using the

            _mm_cvtss_f32 function. When used with functions that operate on packed single-precision floating-point data the scalar element corresponds with the first packed value.

          • _pd, which indicates a function that operates on packed double-precision floating-point data. Packed double- precision floating-point data corresponds to arrays of the C/C type double with either 2, 4, or 8 elements. Values of this type can be loaded from an array using the _mm_loadu_pd, _mm256_loadu_pd, or

            _mm512_loadu_pd functions, or created from individual values using _mm_set_pd, _mm2566_set_pd, or

            _mm512_set_pd functions, and they can be stored in an array using _mm_storeu_pd, _mm256_storeu_pd, or

            _mm512_storeu_pd.

          • _sd, which indicates a function that operates on scalar double-precision floating-point data. Double-precision floating-point data corresponds to the C/C type double, and values of type double can be converted to type

            m128d for use with these functions using the _mm_set_sd function, and converted back using the

            _mm_cvtsd_f64 function. When used with functions that operate on packed double-precision floating-point data the scalar element corresponds with the first packed value.

          • _epi8, which indicates a function that operates on packed 8-bit signed integer values. Packed 8-bit signed integers correspond to an array of signed char with 16, 32 or 64 elements. Values of this type can be created from individual elements using _mm_set_epi8, _mm256_set_epi8, or _mm512_set_epi8 functions.

          • _epi16, which indicates a function that operates on packed 16-bit signed integer values. Packed 16-bit signed integers correspond to an array of short with 8, 16 or 32 elements. Values of this type can be created from individual elements using _mm_set_epi16, _mm256_set_epi16, or _mm512_set_epi16 functions.

          • _epi32, which indicates a function that operates on packed 32-bit signed integer values. Packed 32-bit signed integers correspond to an array of int with 4, 8 or 16 elements. Values of this type can be created from individual elements using _mm_set_epi32, _mm256_set_epi32, or _mm512_set_epi32 functions.

          • _epi64, which indicates a function that operates on packed 64-bit signed integer values. Packed 64-bit signed integers correspond to an array of long long (or long if it is a 64-bit data type) with 2, 4 or 8 elements. Values of this type can be created from individual elements using _mm_set_epi32, _mm256_set_epi32, or

            _mm512_set_epi32 functions.

          • _epu8, which indicates a function that operates on packed 8-bit unsigned integer values. Packed 8-bit unsigned integers correspond to an array of unsigned char with 16, 32 or 64 elements.


          • _epu16, which indicates a function that operates on packed 16-bit unsigned integer values. Packed 16-bit unsigned integers correspond to an array of unsigned short with 8, 16 or 32 elements.

          • _epu32, which indicates a function that operates on packed 32-bit unsigned integer values. Packed 32-bit unsigned integers correspond to an array of unsigned with 4, 8 or 16 elements.

          • _epu64, which indicates a function that operates on packed 64-bit unsigned integer values. Packed 64-bit unsigned integers correspond to an array of unsigned long long (or unsigned long if it is a 64-bit data type) with 2, 4 or 8 elements.

          • _si128, which indicates a function that operates on a single 128-bit value of type m128i.

          • _si256, which indicates a function that operates on a single a 256-bit value of type m256i.

          • _si512, which indicates a function that operates on a single a 512-bit value of type m512i.

            Values of any packed integer type can be loaded from an array using the _mm_loadu_si128,

            _mm256_loadu_si256, or _mm512_loadu_si512 functions, and they can be stored in an array using

            _mm_storeu_si128, _mm256_storeu_si256, or _mm512_storeu_si512.

            These functions and data types are used with the SSE, AVX, and AVX-512 instruction set extension families. In addition there are similar functions that correspond to MMX instructions. These are less frequently used because they require additional state management, and only operate on 64-bit packed integer values.

            The declarations of Intel C/C compiler intrinsic functions may reference some non-standard data types, such as

            int64. The C Standard header stdint.h defines similar platform-independent types, and the documentation for that header gives characteristics that apply to corresponding non-standard types according to the following table.


            Table 3-3. Standard and Non-standard Data Types

            Non-standard Type

            Standard Type (from stdint.h)

            int64

            int64_t

            unsigned int64

            uint64_t

            int32

            int32_t

            unsigned int32

            uint32_t

            int16

            int16_t

            unsigned int16

            uint16_t

            For a more detailed description of each intrinsic function and additional information related to its usage, refer to the online Intel Intrinsics Guide, https://software.intel.com/sites/landingpage/IntrinsicsGuide.


        6. Flags Affected Section

          The “Flags Affected” section lists the flags in the EFLAGS register that are affected by the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1. The arithmetic and logical instructions usually assign values to the status flags in a uniform manner (see Appendix A, “EFLAGS Cross-Reference,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). Non-conventional assignments are described in the “Oper- ation” section. The values of flags listed as undefined may be changed by the instruction in an indeterminate manner. Flags that are not listed are unchanged by the instruction.


        7. FPU Flags Affected Section

          The floating-point instructions have an “FPU Flags Affected” section that describes how each instruction can affect the four condition code flags of the FPU status word.


        8. Protected Mode Exceptions Section

          The “Protected Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in protected mode and the reasons for the exceptions. Each exception is given a mnemonic that consists of a pound



          sign (#) followed by two letters and an optional error code in parentheses. For example, #GP(0) denotes a general protection exception with an error code of 0. Table 3-4 associates each two-letter mnemonic with the corre- sponding exception vector and name. See Chapter 6, “Procedure Calls, Interrupts, and Exceptions,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for a detailed description of the exceptions.

          Application programmers should consult the documentation provided with their operating systems to determine the actions taken when exceptions occur.


          Table 3-4. Intel 64 and IA-32 General Exceptions

          Vector

          Name

          Source

          Protected Mode1

          Real Address Mode

          Virtual 8086

          Mode

          0

          #DE—Divide Error

          DIV and IDIV instructions.

          Yes

          Yes

          Yes

          1

          #DB—Debug

          Any code or data reference.

          Yes

          Yes

          Yes

          3

          #BP—Breakpoint

          INT3 instruction.

          Yes

          Yes

          Yes

          4

          #OF—Overflow

          INTO instruction.

          Yes

          Yes

          Yes

          5

          #BR—BOUND Range Exceeded

          BOUND instruction.

          Yes

          Yes

          Yes

          6

          #UD—Invalid Opcode (Undefined Opcode)

          UD instruction or reserved opcode.

          Yes

          Yes

          Yes

          7

          #NM—Device Not Available (No Math Coprocessor)

          Floating-point or WAIT/FWAIT instruction.

          Yes

          Yes

          Yes

          8

          #DF—Double Fault

          Any instruction that can generate an exception, an NMI, or an INTR.

          Yes

          Yes

          Yes

          10

          #TS—Invalid TSS

          Task switch or TSS access.

          Yes

          Reserved

          Yes

          11

          #NP—Segment Not Present

          Loading segment registers or accessing system segments.

          Yes

          Reserved

          Yes

          12

          #SS—Stack Segment Fault

          Stack operations and SS register loads.

          Yes

          Yes

          Yes

          13

          #GP—General Protection2

          Any memory reference and other protection checks.

          Yes

          Yes

          Yes

          14

          #PF—Page Fault

          Any memory reference.

          Yes

          Reserved

          Yes

          16

          #MF—Floating-Point Error (Math Fault)

          Floating-point or WAIT/FWAIT instruction.

          Yes

          Yes

          Yes

          17

          #AC—Alignment Check

          Any data reference in memory.

          Yes

          Reserved

          Yes

          18

          #MC—Machine Check

          Model dependent machine check errors.

          Yes

          Yes

          Yes

          19

          #XM—SIMD Floating-Point Numeric Error

          SSE/SSE2/SSE3 floating-point instructions.

          Yes

          Yes

          Yes

          NOTES:

          1. Apply to protected mode, compatibility mode, and 64-bit mode.

          2. In the real-address mode, vector 13 is the segment overrun exception.


        9. Real-Address Mode Exceptions Section

          The “Real-Address Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in real-address mode (see Table 3-4).


        10. Virtual-8086 Mode Exceptions Section

          The “Virtual-8086 Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in virtual-8086 mode (see Table 3-4).


        11. Floating-Point Exceptions Section

          The “Floating-Point Exceptions” section lists exceptions that can occur when an x87 FPU floating-point instruction is executed. All of these exception conditions result in a floating-point error exception (#MF, exception 16) being generated. Table 3-5 associates a one- or two-letter mnemonic with the corresponding exception name. See “Floating-Point Exception Conditions” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a detailed description of these exceptions.


          Table 3-5. x87 FPU Floating-Point Exceptions

          Mnemonic

          Name

          Source


          #IS

          #IA

          Floating-point invalid operation:


          #Z

          Floating-point divide-by-zero

          Divide-by-zero

          #D

          Floating-point denormal operand

          Source operand that is a denormal number

          #O

          Floating-point numeric overflow

          Overflow in result

          #U

          Floating-point numeric underflow

          Underflow in result

          #P

          Floating-point inexact result (precision)

          Inexact result (precision)

          • Stack overflow or underflow

          • Invalid arithmetic operation

          • x87 FPU stack overflow or underflow

          • Invalid FPU arithmetic operation


        12. SIMD Floating-Point Exceptions Section

          The “SIMD Floating-Point Exceptions” section lists exceptions that can occur when an SSE/SSE2/SSE3 floating- point instruction is executed. All of these exception conditions result in a SIMD floating-point error exception (#XM, exception 19) being generated. Table 3-6 associates a one-letter mnemonic with the corresponding exception name. For a detailed description of these exceptions, refer to ”SSE and SSE2 Exceptions”, in Chapter 11 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.

          Table 3-6. SIMD Floating-Point Exceptions

          Mnemonic

          Name

          Source

          #I

          Floating-point invalid operation

          Invalid arithmetic operation or source operand

          #Z

          Floating-point divide-by-zero

          Divide-by-zero

          #D

          Floating-point denormal operand

          Source operand that is a denormal number

          #O

          Floating-point numeric overflow

          Overflow in result

          #U

          Floating-point numeric underflow

          Underflow in result

          #P

          Floating-point inexact result

          Inexact result (precision)


        13. Compatibility Mode Exceptions Section

          This section lists exceptions that occur within compatibility mode.


        14. 64-Bit Mode Exceptions Section

          This section lists exceptions that occur within 64-bit mode.


    1. INSTRUCTIONS (A-L)

The remainder of this chapter provides descriptions of Intel 64 and IA-32 instructions (A-L). See also: Chapter 4, “Instruction Set Reference, M-U,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, and Chapter 5, “Instruction Set Reference, V-Z,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2C.


AAA—ASCII Adjust After Addition

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

37

AAA

ZO

Invalid

Valid

ASCII adjust AL after addition.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

ZO

NA

NA

NA

NA

Description

Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are set to 0.

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.


Operation

IF 64-Bit Mode THEN

#UD; ELSE

IF ((AL AND 0FH) 9) or (AF 1)

THEN

AX AX 106H; AF 1;

CF 1; ELSE

AF 0;

CF 0;


FI;

FI;

AL AL AND 0FH;


Flags Affected

The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are set to 0. The OF, SF, ZF, and PF flags are undefined.


Protected Mode Exceptions

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

Same exceptions as protected mode.


Virtual-8086 Mode Exceptions

Same exceptions as protected mode.



Compatibility Mode Exceptions

Same exceptions as protected mode.


64-Bit Mode Exceptions

#UD If in 64-bit mode.


AAD—ASCII Adjust AX Before Division

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

D5 0A

AAD

ZO

Invalid

Valid

ASCII adjust AX before division.

D5 ib

AAD imm8

ZO

Invalid

Valid

Adjust AX before division to number base

imm8.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

ZO

NA

NA

NA

NA

Description

Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AX register by an unpacked BCD value.

The AAD instruction sets the value in the AL register to (AL (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10) number in registers AH and AL.

The generalized version of this instruction allows adjustment of two unpacked digits of any number base (see the “Operation” section below), by setting the imm8 byte to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine code (D5 imm8).

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.


Operation

IF 64-Bit Mode THEN

#UD; ELSE

tempAL AL; tempAH AH;

AL (tempAL (tempAH imm8)) AND FFH; (* imm8 is set to 0AH for the AAD mnemonic.*) AH 0;

FI;

The immediate value (imm8) is taken from the second byte of the instruction.


Flags Affected

The SF, ZF, and PF flags are set according to the resulting binary value in the AL register; the OF, AF, and CF flags are undefined.


Protected Mode Exceptions

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

Same exceptions as protected mode.



Virtual-8086 Mode Exceptions

Same exceptions as protected mode.


Compatibility Mode Exceptions

Same exceptions as protected mode.


64-Bit Mode Exceptions

#UD If in 64-bit mode.


AAM—ASCII Adjust AX After Multiply

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

D4 0A

AAM

ZO

Invalid

Valid

ASCII adjust AX after multiply.

D4 ib

AAM imm8

ZO

Invalid

Valid

Adjust AX after multiply to number base

imm8.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

ZO

NA

NA

NA

NA

Description

Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked (base 10) BCD result.

The generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked digits of any number base (see the “Operation” section below). Here, the imm8 byte is set to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAM mnemonic is interpreted by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the instruction must be hand coded in machine code (D4 imm8).

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.


Operation

IF 64-Bit Mode THEN

#UD; ELSE

tempAL AL;

AH tempAL / imm8; (* imm8 is set to 0AH for the AAM mnemonic *) AL tempAL MOD imm8;

FI;

The immediate value (imm8) is taken from the second byte of the instruction.


Flags Affected

The SF, ZF, and PF flags are set according to the resulting binary value in the AL register. The OF, AF, and CF flags are undefined.


Protected Mode Exceptions

#DE If an immediate value of 0 is used.

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

Same exceptions as protected mode.


Virtual-8086 Mode Exceptions

Same exceptions as protected mode.



Compatibility Mode Exceptions

Same exceptions as protected mode.


64-Bit Mode Exceptions

#UD If in 64-bit mode.


AAS—ASCII Adjust AL After Subtraction

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

3F

AAS

ZO

Invalid

Valid

ASCII adjust AL after subtraction.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

ZO

NA

NA

NA

NA

Description

Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1- digit unpacked BCD result.

If the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top four bits set to 0.

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.


Operation

IF 64-bit mode THEN

#UD; ELSE

IF ((AL AND 0FH) 9) or (AF 1)

THEN

AX AX – 6; AH AH – 1; AF 1;

CF 1;

AL AL AND 0FH; ELSE

CF 0;

AF 0;

AL AL AND 0FH;

FI;

FI;


Flags Affected

The AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are cleared to 0. The OF, SF, ZF, and PF flags are undefined.


Protected Mode Exceptions

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

Same exceptions as protected mode.



Virtual-8086 Mode Exceptions

Same exceptions as protected mode.


Compatibility Mode Exceptions

Same exceptions as protected mode.


64-Bit Mode Exceptions

#UD If in 64-bit mode.


ADC—Add with Carry

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

14 ib

ADC AL, imm8

I

Valid

Valid

Add with carry imm8 to AL.

15 iw

ADC AX, imm16

I

Valid

Valid

Add with carry imm16 to AX.

15 id

ADC EAX, imm32

I

Valid

Valid

Add with carry imm32 to EAX.

REX.W + 15 id

ADC RAX, imm32

I

Valid

N.E.

Add with carry imm32 sign extended to 64- bits to RAX.

80 /2 ib

ADC r/m8, imm8

MI

Valid

Valid

Add with carry imm8 to r/m8.

REX + 80 /2 ib

ADC r/m8*, imm8

MI

Valid

N.E.

Add with carry imm8 to r/m8.

81 /2 iw

ADC r/m16, imm16

MI

Valid

Valid

Add with carry imm16 to r/m16.

81 /2 id

ADC r/m32, imm32

MI

Valid

Valid

Add with CF imm32 to r/m32.

REX.W + 81 /2 id

ADC r/m64, imm32

MI

Valid

N.E.

Add with CF imm32 sign extended to 64-bits to r/m64.

83 /2 ib

ADC r/m16, imm8

MI

Valid

Valid

Add with CF sign-extended imm8 to r/m16.

83 /2 ib

ADC r/m32, imm8

MI

Valid

Valid

Add with CF sign-extended imm8 into r/m32.

REX.W + 83 /2 ib

ADC r/m64, imm8

MI

Valid

N.E.

Add with CF sign-extended imm8 into r/m64.

10 /r

ADC r/m8, r8

MR

Valid

Valid

Add with carry byte register to r/m8.

REX + 10 /r

ADC r/m8*, r8*

MR

Valid

N.E.

Add with carry byte register to r/m64.

11 /r

ADC r/m16, r16

MR

Valid

Valid

Add with carry r16 to r/m16.

11 /r

ADC r/m32, r32

MR

Valid

Valid

Add with CF r32 to r/m32.

REX.W + 11 /r

ADC r/m64, r64

MR

Valid

N.E.

Add with CF r64 to r/m64.

12 /r

ADC r8, r/m8

RM

Valid

Valid

Add with carry r/m8 to byte register.

REX + 12 /r

ADC r8*, r/m8*

RM

Valid

N.E.

Add with carry r/m64 to byte register.

13 /r

ADC r16, r/m16

RM

Valid

Valid

Add with carry r/m16 to r16.

13 /r

ADC r32, r/m32

RM

Valid

Valid

Add with CF r/m32 to r32.

REX.W + 13 /r

ADC r64, r/m64

RM

Valid

N.E.

Add with CF r/m64 to r64.

NOTES:

*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

MR

ModRM:r/m (r, w)

ModRM:reg (r)

NA

NA

MI

ModRM:r/m (r, w)

imm8/16/32

NA

NA

I

AL/AX/EAX/RAX

imm8/16/32

NA

NA

Description

Adds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.



The ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.

The ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is followed by an ADC instruction.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.


Operation

DEST DEST SRC CF;


Intel C/C Compiler Intrinsic Equivalent

ADC: extern unsigned char _addcarry_u8(unsigned char c_in, unsigned char src1, unsigned char src2, unsigned char *sum_out);

ADC: extern unsigned char _addcarry_u16(unsigned char c_in, unsigned short src1, unsigned short src2, unsigned short

*sum_out);

ADC: extern unsigned char _addcarry_u32(unsigned char c_in, unsigned int src1, unsigned char int, unsigned int *sum_out); ADC: extern unsigned char _addcarry_u64(unsigned char c_in, unsigned int64 src1, unsigned int64 src2, unsigned int64

*sum_out);


Flags Affected

The OF, SF, ZF, AF, CF, and PF flags are set according to the result.


Protected Mode Exceptions

#GP(0) If the destination is located in a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Real-Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If a memory operand effective address is outside the SS segment limit.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Virtual-8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made.

#UD If the LOCK prefix is used but the destination is not a memory operand.



Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.


ADCX — Unsigned Integer Addition of Two Operands with Carry Flag

Opcode/ Instruction

Op/ En

64/32bit Mode Support

CPUID

Feature Flag

Description

66 0F 38 F6 /r

ADCX r32, r/m32

RM

V/V

ADX

Unsigned addition of r32 with CF, r/m32 to r32, writes CF.

66 REX.w 0F 38 F6 /r

ADCX r64, r/m64

RM

V/NE

ADX

Unsigned addition of r64 with CF, r/m64 to r64, writes CF.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

Description

Performs an unsigned addition of the destination operand (first operand), the source operand (second operand) and the carry-flag (CF) and stores the result in the destination operand. The destination operand is a general- purpose register, whereas the source operand can be a general-purpose register or memory location. The state of CF can represent a carry from a previous addition. The instruction sets the CF flag with the carry generated by the unsigned addition of the operands.

The ADCX instruction is executed in the context of multi-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of additions, we need to make sure the CF is in a desired initial state.

Often, this initial state needs to be 0, which can be achieved with an instruction to zero the CF (e.g. XOR).

This instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64- bit mode.

In 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi- tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64 bits.

ADCX executes normally either inside or outside a transaction region.

Note: ADCX defines the OF flag differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Archi- tectures Software Developer’s Manual, Volume 2A.


Operation

IF OperandSize is 64-bit

THEN CF:DEST[63:0] DEST[63:0] + SRC[63:0] + CF; ELSE CF:DEST[31:0] DEST[31:0] + SRC[31:0] + CF;

FI;


Flags Affected

CF is updated based on result. OF, SF, ZF, AF and PF flags are unmodified.


Intel C/C++ Compiler Intrinsic Equivalent

unsigned char _addcarryx_u32 (unsigned char c_in, unsigned int src1, unsigned int src2, unsigned int *sum_out);

unsigned char _addcarryx_u64 (unsigned char c_in, unsigned int64 src1, unsigned int64 src2, unsigned int64 *sum_out);


SIMD Floating-Point Exceptions

None


Protected Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.



#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


Real-Address Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.


Virtual-8086 Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


ADD—Add

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

04 ib

ADD AL, imm8

I

Valid

Valid

Add imm8 to AL.

05 iw

ADD AX, imm16

I

Valid

Valid

Add imm16 to AX.

05 id

ADD EAX, imm32

I

Valid

Valid

Add imm32 to EAX.

REX.W + 05 id

ADD RAX, imm32

I

Valid

N.E.

Add imm32 sign-extended to 64-bits to RAX.

80 /0 ib

ADD r/m8, imm8

MI

Valid

Valid

Add imm8 to r/m8.

REX + 80 /0 ib

ADD r/m8*, imm8

MI

Valid

N.E.

Add sign-extended imm8 to r/m8.

81 /0 iw

ADD r/m16, imm16

MI

Valid

Valid

Add imm16 to r/m16.

81 /0 id

ADD r/m32, imm32

MI

Valid

Valid

Add imm32 to r/m32.

REX.W + 81 /0 id

ADD r/m64, imm32

MI

Valid

N.E.

Add imm32 sign-extended to 64-bits to

r/m64.

83 /0 ib

ADD r/m16, imm8

MI

Valid

Valid

Add sign-extended imm8 to r/m16.

83 /0 ib

ADD r/m32, imm8

MI

Valid

Valid

Add sign-extended imm8 to r/m32.

REX.W + 83 /0 ib

ADD r/m64, imm8

MI

Valid

N.E.

Add sign-extended imm8 to r/m64.

00 /r

ADD r/m8, r8

MR

Valid

Valid

Add r8 to r/m8.

REX + 00 /r

ADD r/m8*, r8*

MR

Valid

N.E.

Add r8 to r/m8.

01 /r

ADD r/m16, r16

MR

Valid

Valid

Add r16 to r/m16.

01 /r

ADD r/m32, r32

MR

Valid

Valid

Add r32 to r/m32.

REX.W + 01 /r

ADD r/m64, r64

MR

Valid

N.E.

Add r64 to r/m64.

02 /r

ADD r8, r/m8

RM

Valid

Valid

Add r/m8 to r8.

REX + 02 /r

ADD r8*, r/m8*

RM

Valid

N.E.

Add r/m8 to r8.

03 /r

ADD r16, r/m16

RM

Valid

Valid

Add r/m16 to r16.

03 /r

ADD r32, r/m32

RM

Valid

Valid

Add r/m32 to r32.

REX.W + 03 /r

ADD r64, r/m64

RM

Valid

N.E.

Add r/m64 to r64.

NOTES:

*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

MR

ModRM:r/m (r, w)

ModRM:reg (r)

NA

NA

MI

ModRM:r/m (r, w)

imm8/16/32

NA

NA

I

AL/AX/EAX/RAX

imm8/16/32

NA

NA

Description

Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

The ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer oper- ands and sets the CF and OF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.



This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.


Operation

DEST DEST SRC;


Flags Affected

The OF, SF, ZF, AF, CF, and PF flags are set according to the result.


Protected Mode Exceptions

#GP(0) If the destination is located in a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Real-Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If a memory operand effective address is outside the SS segment limit.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Virtual-8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.


ADDPD—Add Packed Double-Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

66 0F 58 /r

ADDPD xmm1, xmm2/m128

A

V/V

SSE2

Add packed double-precision floating-point values from xmm2/mem to xmm1 and store result in xmm1.

VEX.NDS.128.66.0F.WIG 58 /r

VADDPD xmm1,xmm2, xmm3/m128

B

V/V

AVX

Add packed double-precision floating-point values from xmm3/mem to xmm2 and store result in xmm1.

VEX.NDS.256.66.0F.WIG 58 /r

VADDPD ymm1, ymm2, ymm3/m256

B

V/V

AVX

Add packed double-precision floating-point values from ymm3/mem to ymm2 and store result in ymm1.

EVEX.NDS.128.66.0F.W1 58 /r

VADDPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst

C

V/V

AVX512VL AVX512F

Add packed double-precision floating-point values from xmm3/m128/m64bcst to xmm2 and store result in xmm1 with writemask k1.

EVEX.NDS.256.66.0F.W1 58 /r

VADDPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst

C

V/V

AVX512VL AVX512F

Add packed double-precision floating-point values from ymm3/m256/m64bcst to ymm2 and store result in ymm1 with writemask k1.

EVEX.NDS.512.66.0F.W1 58 /r

VADDPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er}

C

V/V

AVX512F

Add packed double-precision floating-point values from zmm3/m512/m64bcst to zmm2 and store result in zmm1 with writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Full

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Add two, four or eight packed double-precision floating-point values from the first source operand to the second source operand, and stores the packed double-precision floating-point results in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAXVL-1:256) of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: the first source operand is a XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper Bits (MAXVL-1:128) of the corresponding ZMM register destination are unmodified.



Operation

VADDPD (EVEX encoded versions) when src2 operand is a vector register

(KL, VL) = (2, 128), (4, 256), (8, 512) IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j 0 TO KL-1

i j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] SRC1[i+63:i] + SRC2[i+63:i] ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] 0

FI

FI;

ENDFOR

DEST[MAXVL-1:VL] 0


VADDPD (EVEX encoded versions) when src2 operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)


FOR j 0 TO KL-1

i j * 64

IF k1[j] OR *no writemask* THEN

IF (EVEX.b = 1) THEN

DEST[i+63:i] SRC1[i+63:i] + SRC2[63:0] ELSE

DEST[i+63:i] SRC1[i+63:i] + SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] 0

FI

FI;

ENDFOR

DEST[MAXVL-1:VL] 0


VADDPD (VEX.256 encoded version) DEST[63:0] SRC1[63:0] + SRC2[63:0] DEST[127:64] SRC1[127:64] + SRC2[127:64]

DEST[191:128] SRC1[191:128] + SRC2[191:128] DEST[255:192] SRC1[255:192] + SRC2[255:192] DEST[MAXVL-1:256] 0

.



VADDPD (VEX.128 encoded version) DEST[63:0] SRC1[63:0] + SRC2[63:0] DEST[127:64] SRC1[127:64] + SRC2[127:64] DEST[MAXVL-1:128] 0


ADDPD (128-bit Legacy SSE version) DEST[63:0] DEST[63:0] + SRC[63:0] DEST[127:64] DEST[127:64] + SRC[127:64]

DEST[MAXVL-1:128] (Unmodified)


Intel C/C++ Compiler Intrinsic Equivalent

VADDPD m512d _mm512_add_pd ( m512d a, m512d b);

VADDPD m512d _mm512_mask_add_pd ( m512d s, mmask8 k, m512d a, m512d b); VADDPD m512d _mm512_maskz_add_pd ( mmask8 k, m512d a, m512d b);

VADDPD m256d _mm256_mask_add_pd ( m256d s, mmask8 k, m256d a, m256d b); VADDPD m256d _mm256_maskz_add_pd ( mmask8 k, m256d a, m256d b);

VADDPD m128d _mm_mask_add_pd ( m128d s, mmask8 k, m128d a, m128d b); VADDPD m128d _mm_maskz_add_pd ( mmask8 k, m128d a, m128d b);

VADDPD m512d _mm512_add_round_pd ( m512d a, m512d b, int);

VADDPD m512d _mm512_mask_add_round_pd ( m512d s, mmask8 k, m512d a, m512d b, int); VADDPD m512d _mm512_maskz_add_round_pd ( mmask8 k, m512d a, m512d b, int);

ADDPD m256d _mm256_add_pd ( m256d a, m256d b); ADDPD m128d _mm_add_pd ( m128d a, m128d b);


SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal


Other Exceptions

VEX-encoded instruction, see Exceptions Type 2. EVEX-encoded instruction, see Exceptions Type E2.


ADDPS—Add Packed Single-Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

NP 0F 58 /r

ADDPS xmm1, xmm2/m128

A

V/V

SSE

Add packed single-precision floating-point values from xmm2/m128 to xmm1 and store result in xmm1.

VEX.NDS.128.0F.WIG 58 /r

VADDPS xmm1,xmm2, xmm3/m128

B

V/V

AVX

Add packed single-precision floating-point values from xmm3/m128 to xmm2 and store result in xmm1.

VEX.NDS.256.0F.WIG 58 /r

VADDPS ymm1, ymm2, ymm3/m256

B

V/V

AVX

Add packed single-precision floating-point values from ymm3/m256 to ymm2 and store result in ymm1.

EVEX.NDS.128.0F.W0 58 /r

VADDPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst

C

V/V

AVX512VL AVX512F

Add packed single-precision floating-point values from xmm3/m128/m32bcst to xmm2 and store result in xmm1 with writemask k1.

EVEX.NDS.256.0F.W0 58 /r

VADDPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst

C

V/V

AVX512VL AVX512F

Add packed single-precision floating-point values from ymm3/m256/m32bcst to ymm2 and store result in ymm1 with writemask k1.

EVEX.NDS.512.0F.W0 58 /r

VADDPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst {er}

C

V/V

AVX512F

Add packed single-precision floating-point values from zmm3/m512/m32bcst to zmm2 and store result in zmm1 with writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Full

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Add four, eight or sixteen packed single-precision floating-point values from the first source operand with the second source operand, and stores the packed single-precision floating-point results in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAXVL-1:256) of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: the first source operand is a XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper Bits (MAXVL-1:128) of the corresponding ZMM register destination are unmodified.



Operation

VADDPS (EVEX encoded versions) when src2 operand is a register

(KL, VL) = (4, 128), (8, 256), (16, 512) IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j 0 TO KL-1

i j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i] SRC1[i+31:i] + SRC2[i+31:i] ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i] 0

FI

FI;

ENDFOR;

DEST[MAXVL-1:VL] 0


VADDPS (EVEX encoded versions) when src2 operand is a memory source

(KL, VL) = (4, 128), (8, 256), (16, 512)


FOR j 0 TO KL-1

i j * 32

IF k1[j] OR *no writemask* THEN

IF (EVEX.b = 1) THEN

DEST[i+31:i] SRC1[i+31:i] + SRC2[31:0] ELSE

DEST[i+31:i] SRC1[i+31:i] + SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i] 0

FI

FI;

ENDFOR;

DEST[MAXVL-1:VL] 0



VADDPS (VEX.256 encoded version) DEST[31:0] SRC1[31:0] + SRC2[31:0] DEST[63:32] SRC1[63:32] + SRC2[63:32] DEST[95:64] SRC1[95:64] + SRC2[95:64] DEST[127:96] SRC1[127:96] + SRC2[127:96]

DEST[159:128] SRC1[159:128] + SRC2[159:128] DEST[191:160] SRC1[191:160] + SRC2[191:160] DEST[223:192] SRC1[223:192] + SRC2[223:192] DEST[255:224] SRC1[255:224] + SRC2[255:224]. DEST[MAXVL-1:256] 0


VADDPS (VEX.128 encoded version) DEST[31:0] SRC1[31:0] + SRC2[31:0] DEST[63:32] SRC1[63:32] + SRC2[63:32] DEST[95:64] SRC1[95:64] + SRC2[95:64] DEST[127:96] SRC1[127:96] + SRC2[127:96] DEST[MAXVL-1:128] 0


ADDPS (128-bit Legacy SSE version) DEST[31:0] SRC1[31:0] + SRC2[31:0] DEST[63:32] SRC1[63:32] + SRC2[63:32] DEST[95:64] SRC1[95:64] + SRC2[95:64]

DEST[127:96] SRC1[127:96] + SRC2[127:96]

DEST[MAXVL-1:128] (Unmodified)


Intel C/C++ Compiler Intrinsic Equivalent

VADDPS m512 _mm512_add_ps ( m512 a, m512 b);

VADDPS m512 _mm512_mask_add_ps ( m512 s, mmask16 k, m512 a, m512 b); VADDPS m512 _mm512_maskz_add_ps ( mmask16 k, m512 a, m512 b);

VADDPS m256 _mm256_mask_add_ps ( m256 s, mmask8 k, m256 a, m256 b); VADDPS m256 _mm256_maskz_add_ps ( mmask8 k, m256 a, m256 b);

VADDPS m128 _mm_mask_add_ps ( m128d s, mmask8 k, m128 a, m128 b); VADDPS m128 _mm_maskz_add_ps ( mmask8 k, m128 a, m128 b);

VADDPS m512 _mm512_add_round_ps ( m512 a, m512 b, int);

VADDPS m512 _mm512_mask_add_round_ps ( m512 s, mmask16 k, m512 a, m512 b, int); VADDPS m512 _mm512_maskz_add_round_ps ( mmask16 k, m512 a, m512 b, int);

ADDPS m256 _mm256_add_ps ( m256 a, m256 b); ADDPS m128 _mm_add_ps ( m128 a, m128 b);


SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal


Other Exceptions

VEX-encoded instruction, see Exceptions Type 2. EVEX-encoded instruction, see Exceptions Type E2.


ADDSD—Add Scalar Double-Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

F2 0F 58 /r

ADDSD xmm1, xmm2/m64

A

V/V

SSE2

Add the low double-precision floating-point value from xmm2/mem to xmm1 and store the result in xmm1.

VEX.NDS.LIG.F2.0F.WIG 58 /r

VADDSD xmm1, xmm2, xmm3/m64

B

V/V

AVX

Add the low double-precision floating-point value from xmm3/mem to xmm2 and store the result in xmm1.

EVEX.NDS.LIG.F2.0F.W1 58 /r VADDSD xmm1 {k1}{z},

xmm2, xmm3/m64{er}

C

V/V

AVX512F

Add the low double-precision floating-point value from xmm3/m64 to xmm2 and store the result in xmm1 with writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Tuple1 Scalar

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Adds the low double-precision floating-point values from the second source operand and the first source operand and stores the double-precision floating-point result in the destination operand.

The second source operand can be an XMM register or a 64-bit memory location. The first source and destination operands are XMM registers.

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAXVL-1:64) of the corresponding destination register remain unchanged.

EVEX and VEX.128 encoded version: The first source operand is encoded by EVEX.vvvv/VEX.vvvv. Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAXVL-1:128) of the destination register are zeroed.

EVEX version: The low quadword element of the destination is updated according to the writemask.

Software should ensure VADDSD is encoded with VEX.L=0. Encoding VADDSD with VEX.L=1 may encounter unpredictable behavior across different processor generations.



Operation

VADDSD (EVEX encoded version)

IF (EVEX.b = 1) AND SRC2 *is a register* THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF k1[0] or *no writemask*

THEN DEST[63:0] SRC1[63:0] + SRC2[63:0] ELSE

IF *merging-masking* ; merging-masking THEN *DEST[63:0] remains unchanged*

ELSE ; zeroing-masking

THEN DEST[63:0] 0

FI;

FI;

DEST[127:64] SRC1[127:64] DEST[MAXVL-1:128] 0


VADDSD (VEX.128 encoded version) DEST[63:0] SRC1[63:0] + SRC2[63:0] DEST[127:64] SRC1[127:64] DEST[MAXVL-1:128] 0


ADDSD (128-bit Legacy SSE version)

DEST[63:0] DEST[63:0] + SRC[63:0]

DEST[MAXVL-1:64] (Unmodified)


Intel C/C++ Compiler Intrinsic Equivalent

VADDSD m128d _mm_mask_add_sd ( m128d s, mmask8 k, m128d a, m128d b); VADDSD m128d _mm_maskz_add_sd ( mmask8 k, m128d a, m128d b);

VADDSD m128d _mm_add_round_sd ( m128d a, m128d b, int);

VADDSD m128d _mm_mask_add_round_sd ( m128d s, mmask8 k, m128d a, m128d b, int); VADDSD m128d _mm_maskz_add_round_sd ( mmask8 k, m128d a, m128d b, int);

ADDSD m128d _mm_add_sd ( m128d a, m128d b);


SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal


Other Exceptions

VEX-encoded instruction, see Exceptions Type 3. EVEX-encoded instruction, see Exceptions Type E3.


ADDSS—Add Scalar Single-Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

F3 0F 58 /r

ADDSS xmm1, xmm2/m32

A

V/V

SSE

Add the low single-precision floating-point value from xmm2/mem to xmm1 and store the result in xmm1.

VEX.NDS.LIG.F3.0F.WIG 58 /r

VADDSS xmm1,xmm2, xmm3/m32

B

V/V

AVX

Add the low single-precision floating-point value from xmm3/mem to xmm2 and store the result in xmm1.

EVEX.NDS.LIG.F3.0F.W0 58 /r

VADDSS xmm1{k1}{z}, xmm2, xmm3/m32{er}

C

V/V

AVX512F

Add the low single-precision floating-point value from xmm3/m32 to xmm2 and store the result in xmm1with writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Tuple1 Scalar

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Adds the low single-precision floating-point values from the second source operand and the first source operand, and stores the double-precision floating-point result in the destination operand.

The second source operand can be an XMM register or a 64-bit memory location. The first source and destination operands are XMM registers.

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAXVL-1:32) of the corresponding the destination register remain unchanged.

EVEX and VEX.128 encoded version: The first source operand is encoded by EVEX.vvvv/VEX.vvvv. Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAXVL-1:128) of the destination register are zeroed.

EVEX version: The low doubleword element of the destination is updated according to the writemask.

Software should ensure VADDSS is encoded with VEX.L=0. Encoding VADDSS with VEX.L=1 may encounter unpre- dictable behavior across different processor generations.



Operation

VADDSS (EVEX encoded versions)

IF (EVEX.b = 1) AND SRC2 *is a register* THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF k1[0] or *no writemask*

THEN DEST[31:0] SRC1[31:0] + SRC2[31:0] ELSE

IF *merging-masking* ; merging-masking THEN *DEST[31:0] remains unchanged*

ELSE ; zeroing-masking

THEN DEST[31:0] 0

FI;

FI;

DEST[127:32] SRC1[127:32] DEST[MAXVL-1:128] 0


VADDSS DEST, SRC1, SRC2 (VEX.128 encoded version)

DEST[31:0] SRC1[31:0] + SRC2[31:0] DEST[127:32] SRC1[127:32] DEST[MAXVL-1:128] 0


ADDSS DEST, SRC (128-bit Legacy SSE version)

DEST[31:0] DEST[31:0] + SRC[31:0]

DEST[MAXVL-1:32] (Unmodified)


Intel C/C++ Compiler Intrinsic Equivalent

VADDSS m128 _mm_mask_add_ss ( m128 s, mmask8 k, m128 a, m128 b); VADDSS m128 _mm_maskz_add_ss ( mmask8 k, m128 a, m128 b);

VADDSS m128 _mm_add_round_ss ( m128 a, m128 b, int);

VADDSS m128 _mm_mask_add_round_ss ( m128 s, mmask8 k, m128 a, m128 b, int); VADDSS m128 _mm_maskz_add_round_ss ( mmask8 k, m128 a, m128 b, int);

ADDSS m128 _mm_add_ss ( m128 a, m128 b);


SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal


Other Exceptions

VEX-encoded instruction, see Exceptions Type 3. EVEX-encoded instruction, see Exceptions Type E3.


ADDSUBPD—Packed Double-FP Add/Subtract

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F D0 /r

ADDSUBPD xmm1, xmm2/m128

RM

V/V

SSE3

Add/subtract double-precision floating-point values from xmm2/m128 to xmm1.

VEX.NDS.128.66.0F.WIG D0 /r

VADDSUBPD xmm1, xmm2, xmm3/m128

RVM

V/V

AVX

Add/subtract packed double-precision floating-point values from xmm3/mem to xmm2 and stores result in xmm1.

VEX.NDS.256.66.0F.WIG D0 /r

VADDSUBPD ymm1, ymm2, ymm3/m256

RVM

V/V

AVX

Add / subtract packed double-precision floating-point values from ymm3/mem to ymm2 and stores result in ymm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

Adds odd-numbered double-precision floating-point values of the first source operand (second operand) with the corresponding double-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered double-precision floating-point values from the second source operand from the corresponding double-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand.

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding YMM register destination are unmodified. See Figure 3-3.

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding YMM register destination are zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.


ADDSUBPD xmm1, xmm2/m128


[127:64]

[63:0]

xmm2/m128



xmm1[127:64] + xmm2/m128[127:64]

xmm1[63:0] - xmm2/m128[63:0]

RESULT:

xmm1


[127:64]

[63:0]


image

Figure 3-3. ADDSUBPD—Packed Double-FP Add/Subtract


Operation

ADDSUBPD (128-bit Legacy SSE version) DEST[63:0] DEST[63:0] - SRC[63:0] DEST[127:64] DEST[127:64] + SRC[127:64]

DEST[MAXVL-1:128] (Unmodified)


VADDSUBPD (VEX.128 encoded version) DEST[63:0] SRC1[63:0] - SRC2[63:0] DEST[127:64] SRC1[127:64] + SRC2[127:64] DEST[MAXVL-1:128] 0


VADDSUBPD (VEX.256 encoded version) DEST[63:0] SRC1[63:0] - SRC2[63:0] DEST[127:64] SRC1[127:64] + SRC2[127:64] DEST[191:128] SRC1[191:128] - SRC2[191:128] DEST[255:192] SRC1[255:192] + SRC2[255:192]


Intel C/C Compiler Intrinsic Equivalent

ADDSUBPD: m128d _mm_addsub_pd( m128d a, m128d b)

VADDSUBPD: m256d _mm256_addsub_pd ( m256d a, m256d b)


Exceptions

When the source operand is a memory operand, it must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.


SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal.


Other Exceptions

See Exceptions Type 2.


ADDSUBPS—Packed Single-FP Add/Subtract

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

F2 0F D0 /r

ADDSUBPS xmm1, xmm2/m128

RM

V/V

SSE3

Add/subtract single-precision floating-point values from xmm2/m128 to xmm1.

VEX.NDS.128.F2.0F.WIG D0 /r

VADDSUBPS xmm1, xmm2, xmm3/m128

RVM

V/V

AVX

Add/subtract single-precision floating-point values from xmm3/mem to xmm2 and stores result in xmm1.

VEX.NDS.256.F2.0F.WIG D0 /r

VADDSUBPS ymm1, ymm2, ymm3/m256

RVM

V/V

AVX

Add / subtract single-precision floating-point values from ymm3/mem to ymm2 and stores result in ymm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

Adds odd-numbered single-precision floating-point values of the first source operand (second operand) with the corresponding single-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered single-precision floating-point values from the second source operand from the corresponding single-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand.

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding YMM register destination are unmodified. See Figure 3-4.

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding YMM register destination are zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.


image

ADDSUBPS xmm1, xmm2/m128


[127:96]

[95:64]

[63:32]

[31:0]








xmm1[127:96] + xmm2/m128[127:96]


xmm1[95:64] - xmm2/ m128[95:64]


xmm1[63:32] +

xmm2/m128[63:32]


xmm1[31:0] -

xmm2/m128[31:0]

[127:96] [95:64] [63:32] [31:0]


xmm2/ m128


RESULT:

xmm1


OM15992


Figure 3-4. ADDSUBPS—Packed Single-FP Add/Subtract


Operation

ADDSUBPS (128-bit Legacy SSE version) DEST[31:0] DEST[31:0] - SRC[31:0] DEST[63:32] DEST[63:32] + SRC[63:32] DEST[95:64] DEST[95:64] - SRC[95:64] DEST[127:96] DEST[127:96] + SRC[127:96]

DEST[MAXVL-1:128] (Unmodified)


VADDSUBPS (VEX.128 encoded version) DEST[31:0] SRC1[31:0] - SRC2[31:0] DEST[63:32] SRC1[63:32] + SRC2[63:32] DEST[95:64] SRC1[95:64] - SRC2[95:64] DEST[127:96] SRC1[127:96] + SRC2[127:96] DEST[MAXVL-1:128] 0


VADDSUBPS (VEX.256 encoded version) DEST[31:0] SRC1[31:0] - SRC2[31:0] DEST[63:32] SRC1[63:32] + SRC2[63:32] DEST[95:64] SRC1[95:64] - SRC2[95:64] DEST[127:96] SRC1[127:96] + SRC2[127:96] DEST[159:128] SRC1[159:128] - SRC2[159:128] DEST[191:160] SRC1[191:160] + SRC2[191:160] DEST[223:192] SRC1[223:192] - SRC2[223:192]

DEST[255:224] SRC1[255:224] + SRC2[255:224].


Intel C/C Compiler Intrinsic Equivalent

ADDSUBPS: m128 _mm_addsub_ps( m128 a, m128 b) VADDSUBPS: m256 _mm256_addsub_ps ( m256 a, m256 b)

Exceptions

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general- protection exception (#GP) will be generated.



SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal.


Other Exceptions

See Exceptions Type 2.


ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag

Opcode/ Instruction

Op/ En

64/32bit Mode Support

CPUID

Feature Flag

Description

F3 0F 38 F6 /r

ADOX r32, r/m32

RM

V/V

ADX

Unsigned addition of r32 with OF, r/m32 to r32, writes OF.

F3 REX.w 0F 38 F6 /r

ADOX r64, r/m64

RM

V/NE

ADX

Unsigned addition of r64 with OF, r/m64 to r64, writes OF.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

Description

Performs an unsigned addition of the destination operand (first operand), the source operand (second operand) and the overflow-flag (OF) and stores the result in the destination operand. The destination operand is a general- purpose register, whereas the source operand can be a general-purpose register or memory location. The state of OF represents a carry from a previous addition. The instruction sets the OF flag with the carry generated by the unsigned addition of the operands.

The ADOX instruction is executed in the context of multi-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of additions, we execute an instruction to zero the OF (e.g. XOR).

This instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode.

In 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi- tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64-bits.

ADOX executes normally either inside or outside a transaction region.

Note: ADOX defines the CF and OF flags differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A.


Operation

IF OperandSize is 64-bit

THEN OF:DEST[63:0] DEST[63:0] + SRC[63:0] + OF; ELSE OF:DEST[31:0] DEST[31:0] + SRC[31:0] + OF;

FI;


Flags Affected

OF is updated based on result. CF, SF, ZF, AF and PF flags are unmodified.


Intel C/C++ Compiler Intrinsic Equivalent

unsigned char _addcarryx_u32 (unsigned char c_in, unsigned int src1, unsigned int src2, unsigned int *sum_out);

unsigned char _addcarryx_u64 (unsigned char c_in, unsigned int64 src1, unsigned int64 src2, unsigned int64 *sum_out);


SIMD Floating-Point Exceptions

None



Protected Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


Real-Address Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.


Virtual-8086 Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


AESDEC—Perform One Round of an AES Decryption Flow

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 38 DE /r

AESDEC xmm1, xmm2/m128

RM

V/V

AES

Perform one round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128.

VEX.NDS.128.66.0F38.WIG DE /r

VAESDEC xmm1, xmm2, xmm3/m128

RVM

V/V

Both AES and

AVX flags

Perform one round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from xmm3/m128; store the result in xmm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand2

Operand3

Operand4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

This instruction performs a single round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.

Use the AESDEC instruction for all but the last decryption round. For the last decryption round, use the AESDE- CLAST instruction.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL- 1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the destination YMM register are zeroed.


Operation

AESDEC

STATE SRC1;

RoundKey SRC2;

STATE InvShiftRows( STATE ); STATE InvSubBytes( STATE ); STATE InvMixColumns( STATE ); DEST[127:0] STATE XOR RoundKey;

DEST[MAXVL-1:128] (Unmodified)


VAESDEC

STATE SRC1;

RoundKey SRC2;

STATE InvShiftRows( STATE ); STATE InvSubBytes( STATE ); STATE InvMixColumns( STATE ); DEST[127:0] STATE XOR RoundKey; DEST[MAXVL-1:128] 0



Intel C/C++ Compiler Intrinsic Equivalent

(V)AESDEC: m128i _mm_aesdec ( m128i, m128i)


SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4.


AESDECLAST—Perform Last Round of an AES Decryption Flow

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 38 DF /r

AESDECLAST xmm1, xmm2/m128

RM

V/V

AES

Perform the last round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128.

VEX.NDS.128.66.0F38.WIG DF /r

VAESDECLAST xmm1, xmm2, xmm3/m128

RVM

V/V

Both AES and

AVX flags

Perform the last round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from xmm3/m128; store the result in xmm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand2

Operand3

Operand4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

This instruction performs the last round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL- 1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the destination YMM register are zeroed.


Operation AESDECLAST STATE SRC1;

RoundKey SRC2;

STATE InvShiftRows( STATE ); STATE InvSubBytes( STATE ); DEST[127:0] STATE XOR RoundKey;

DEST[MAXVL-1:128] (Unmodified)


VAESDECLAST

STATE SRC1;

RoundKey SRC2;

STATE InvShiftRows( STATE ); STATE InvSubBytes( STATE ); DEST[127:0] STATE XOR RoundKey; DEST[MAXVL-1:128] 0

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESDECLAST: m128i _mm_aesdeclast ( m128i, m128i)



SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4.


AESENC—Perform One Round of an AES Encryption Flow

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 38 DC /r

AESENC xmm1, xmm2/m128

RM

V/V

AES

Perform one round of an AES encryption flow, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128.

VEX.NDS.128.66.0F38.WIG DC /r

VAESENC xmm1, xmm2, xmm3/m128

RVM

V/V

Both AES and

AVX flags

Perform one round of an AES encryption flow, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from the xmm3/m128; store the result in xmm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand2

Operand3

Operand4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

This instruction performs a single round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.

Use the AESENC instruction for all but the last encryption rounds. For the last encryption round, use the AESENC- CLAST instruction.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL- 1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the destination YMM register are zeroed.


Operation

AESENC

STATE SRC1;

RoundKey SRC2;

STATE ShiftRows( STATE ); STATE SubBytes( STATE ); STATE MixColumns( STATE );

DEST[127:0] STATE XOR RoundKey;

DEST[MAXVL-1:128] (Unmodified)


VAESENC

STATE SRC1;

RoundKey SRC2;

STATE ShiftRows( STATE ); STATE SubBytes( STATE ); STATE MixColumns( STATE );

DEST[127:0] STATE XOR RoundKey; DEST[MAXVL-1:128] 0



Intel C/C++ Compiler Intrinsic Equivalent

(V)AESENC: m128i _mm_aesenc ( m128i, m128i)


SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4.


AESENCLAST—Perform Last Round of an AES Encryption Flow

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 38 DD /r

AESENCLAST xmm1, xmm2/m128

RM

V/V

AES

Perform the last round of an AES encryption flow, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128.

VEX.NDS.128.66.0F38.WIG DD /r

VAESENCLAST xmm1, xmm2, xmm3/m128

RVM

V/V

Both AES and

AVX flags

Perform the last round of an AES encryption flow, operating on a 128-bit data (state) from xmm2 with a 128 bit round key from xmm3/m128; store the result in xmm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand2

Operand3

Operand4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

This instruction performs the last round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL- 1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (MAXVL-1:128) of the destination YMM register are zeroed.


Operation AESENCLAST STATE SRC1;

RoundKey SRC2;

STATE ShiftRows( STATE ); STATE SubBytes( STATE );

DEST[127:0] STATE XOR RoundKey;

DEST[MAXVL-1:128] (Unmodified)


VAESENCLAST

STATE SRC1;

RoundKey SRC2;

STATE ShiftRows( STATE ); STATE SubBytes( STATE );

DEST[127:0] STATE XOR RoundKey; DEST[MAXVL-1:128] 0


Intel C/C++ Compiler Intrinsic Equivalent

(V)AESENCLAST: m128i _mm_aesenclast ( m128i, m128i)



SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4.


AESIMC—Perform the AES InvMixColumn Transformation

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 38 DB /r

AESIMC xmm1, xmm2/m128

RM

V/V

AES

Perform the InvMixColumn transformation on a 128-bit round key from xmm2/m128 and store the result in xmm1.

VEX.128.66.0F38.WIG DB /r

VAESIMC xmm1, xmm2/m128

RM

V/V

Both AES and

AVX flags

Perform the InvMixColumn transformation on a 128-bit round key from xmm2/m128 and store the result in xmm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand2

Operand3

Operand4

RM

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

Description

Perform the InvMixColumns transformation on the source operand and store the result in the destination operand. The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca- tion.

Note: the AESIMC instruction should be applied to the expanded AES round keys (except for the first and last round key) in order to prepare them for decryption using the “Equivalent Inverse Cipher” (defined in FIPS 197).

128-bit Legacy SSE version: Bits (MAXVL-1:128) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (MAXVL-1:128) of the destination YMM register are zeroed.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.


Operation

AESIMC

DEST[127:0] InvMixColumns( SRC ); DEST[MAXVL-1:128] (Unmodified)


VAESIMC

DEST[127:0] InvMixColumns( SRC ); DEST[MAXVL-1:128] 0;


Intel C/C++ Compiler Intrinsic Equivalent

(V)AESIMC: m128i _mm_aesimc ( m128i)


SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4; additionally

#UD If VEX.vvvv ≠ 1111B.


AESKEYGENASSIST—AES Round Key Generation Assist

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 3A DF /r ib

AESKEYGENASSIST xmm1, xmm2/m128, imm8

RMI

V/V

AES

Assist in AES round key generation using an 8 bits Round Constant (RCON) specified in the immediate byte, operating on 128 bits of data specified in xmm2/m128 and stores the result in xmm1.

VEX.128.66.0F3A.WIG DF /r ib

VAESKEYGENASSIST xmm1, xmm2/m128, imm8

RMI

V/V

Both AES and

AVX flags

Assist in AES round key generation using 8 bits Round Constant (RCON) specified in the immediate byte, operating on 128 bits of data specified in xmm2/m128 and stores the result in xmm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand2

Operand3

Operand4

RMI

ModRM:reg (w)

ModRM:r/m (r)

imm8

NA

Description

Assist in expanding the AES cipher key, by computing steps towards generating a round key for encryption, using 128-bit data specified in the source operand and an 8-bit round constant specified as an immediate, store the result in the destination operand.

The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca- tion.

128-bit Legacy SSE version: Bits (MAXVL-1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: Bits (MAXVL-1:128) of the destination YMM register are zeroed.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.


Operation AESKEYGENASSIST X3[31:0] SRC [127: 96];

X2[31:0] SRC [95: 64];

X1[31:0] SRC [63: 32];

X0[31:0] SRC [31: 0];

RCON[31:0] ZeroExtend(Imm8[7:0]); DEST[31:0] SubWord(X1);

DEST[63:32 ] RotWord( SubWord(X1) ) XOR RCON; DEST[95:64] SubWord(X3);

DEST[127:96] RotWord( SubWord(X3) ) XOR RCON;

DEST[MAXVL-1:128] (Unmodified)



VAESKEYGENASSIST

X3[31:0] SRC [127: 96];

X2[31:0] SRC [95: 64];

X1[31:0] SRC [63: 32];

X0[31:0] SRC [31: 0];

RCON[31:0] ZeroExtend(Imm8[7:0]); DEST[31:0] SubWord(X1);

DEST[63:32 ] RotWord( SubWord(X1) ) XOR RCON; DEST[95:64] SubWord(X3);

DEST[127:96] RotWord( SubWord(X3) ) XOR RCON; DEST[MAXVL-1:128] 0;


Intel C/C++ Compiler Intrinsic Equivalent

(V)AESKEYGENASSIST: m128i _mm_aeskeygenassist ( m128i, const int)


SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4; additionally

#UD If VEX.vvvv ≠ 1111B.


AND—Logical AND

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

24 ib

AND AL, imm8

I

Valid

Valid

AL AND imm8.

25 iw

AND AX, imm16

I

Valid

Valid

AX AND imm16.

25 id

AND EAX, imm32

I

Valid

Valid

EAX AND imm32.

REX.W + 25 id

AND RAX, imm32

I

Valid

N.E.

RAX AND imm32 sign-extended to 64-bits.

80 /4 ib

AND r/m8, imm8

MI

Valid

Valid

r/m8 AND imm8.

REX + 80 /4 ib

AND r/m8*, imm8

MI

Valid

N.E.

r/m8 AND imm8.

81 /4 iw

AND r/m16, imm16

MI

Valid

Valid

r/m16 AND imm16.

81 /4 id

AND r/m32, imm32

MI

Valid

Valid

r/m32 AND imm32.

REX.W + 81 /4 id

AND r/m64, imm32

MI

Valid

N.E.

r/m64 AND imm32 sign extended to 64-bits.

83 /4 ib

AND r/m16, imm8

MI

Valid

Valid

r/m16 AND imm8 (sign-extended).

83 /4 ib

AND r/m32, imm8

MI

Valid

Valid

r/m32 AND imm8 (sign-extended).

REX.W + 83 /4 ib

AND r/m64, imm8

MI

Valid

N.E.

r/m64 AND imm8 (sign-extended).

20 /r

AND r/m8, r8

MR

Valid

Valid

r/m8 AND r8.

REX + 20 /r

AND r/m8*, r8*

MR

Valid

N.E.

r/m64 AND r8 (sign-extended).

21 /r

AND r/m16, r16

MR

Valid

Valid

r/m16 AND r16.

21 /r

AND r/m32, r32

MR

Valid

Valid

r/m32 AND r32.

REX.W + 21 /r

AND r/m64, r64

MR

Valid

N.E.

r/m64 AND r32.

22 /r

AND r8, r/m8

RM

Valid

Valid

r8 AND r/m8.

REX + 22 /r

AND r8*, r/m8*

RM

Valid

N.E.

r/m64 AND r8 (sign-extended).

23 /r

AND r16, r/m16

RM

Valid

Valid

r16 AND r/m16.

23 /r

AND r32, r/m32

RM

Valid

Valid

r32 AND r/m32.

REX.W + 23 /r

AND r64, r/m64

RM

Valid

N.E.

r64 AND r/m64.

NOTES:

*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

MR

ModRM:r/m (r, w)

ModRM:reg (r)

NA

NA

MI

ModRM:r/m (r, w)

imm8/16/32

NA

NA

I

AL/AX/EAX/RAX

imm8/16/32

NA

NA

Description

Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0.

This instruction can be used with a LOCK prefix to allow the it to be executed atomically.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.


AND—Logical AND Vol. 2A 3-61



Operation

DEST DEST AND SRC;


Flags Affected

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.


Protected Mode Exceptions

#GP(0) If the destination operand points to a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a NULL segment selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Real-Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If a memory operand effective address is outside the SS segment limit.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Virtual-8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made.

#UD If the LOCK prefix is used but the destination is not a memory operand.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.


ANDN — Logical AND NOT

Opcode/Instruction

Op/ En

64/32

-bit Mode

CPUID

Feature Flag

Description

VEX.NDS.LZ.0F38.W0 F2 /r

ANDN r32a, r32b, r/m32

RVM

V/V

BMI1

Bitwise AND of inverted r32b with r/m32, store result in r32a.

VEX.NDS.LZ. 0F38.W1 F2 /r

ANDN r64a, r64b, r/m64

RVM

V/NE

BMI1

Bitwise AND of inverted r64b with r/m64, store result in r64a.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RVM

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

NA

Description

Performs a bitwise logical AND of inverted second operand (the first source operand) with the third operand (the second source operand). The result is stored in the first operand (destination operand).

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.


Operation

DEST (NOT SRC1) bitwiseAND SRC2;

SF DEST[OperandSize -1]; ZF (DEST = 0);

Flags Affected

SF and ZF are updated based on result. OF and CF flags are cleared. AF and PF flags are undefined.


Intel C/C++ Compiler Intrinsic Equivalent

Auto-generated from high-level language.


SIMD Floating-Point Exceptions

None


Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.


ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

66 0F 54 /r

ANDPD xmm1, xmm2/m128

A

V/V

SSE2

Return the bitwise logical AND of packed double- precision floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.66.0F 54 /r

VANDPD xmm1, xmm2, xmm3/m128

B

V/V

AVX

Return the bitwise logical AND of packed double- precision floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.66.0F 54 /r

VANDPD ymm1, ymm2, ymm3/m256

B

V/V

AVX

Return the bitwise logical AND of packed double- precision floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.66.0F.W1 54 /r

VANDPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND of packed double- precision floating-point values in xmm2 and xmm3/m128/m64bcst subject to writemask k1.

EVEX.NDS.256.66.0F.W1 54 /r

VANDPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND of packed double- precision floating-point values in ymm2 and ymm3/m256/m64bcst subject to writemask k1.

EVEX.NDS.512.66.0F.W1 54 /r

VANDPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst

C

V/V

AVX512DQ

Return the bitwise logical AND of packed double- precision floating-point values in zmm2 and zmm3/m512/m64bcst subject to writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Full

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Performs a bitwise logical AND of the two, four or eight packed double-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAXVL-1:256) of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding register destination are unmodified.



Operation

VANDPD (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j 0 TO KL-1

i j * 64

IF k1[j] OR *no writemask* THEN

IF (EVEX.b == 1) AND (SRC2 *is memory*) THEN

DEST[i+63:i] SRC1[i+63:i] BITWISE AND SRC2[63:0] ELSE

DEST[i+63:i] SRC1[i+63:i] BITWISE AND SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] = 0

FI;

FI;

ENDFOR

DEST[MAXVL-1:VL] 0


VANDPD (VEX.256 encoded version)

DEST[63:0] SRC1[63:0] BITWISE AND SRC2[63:0] DEST[127:64] SRC1[127:64] BITWISE AND SRC2[127:64] DEST[191:128] SRC1[191:128] BITWISE AND SRC2[191:128] DEST[255:192] SRC1[255:192] BITWISE AND SRC2[255:192] DEST[MAXVL-1:256] 0


VANDPD (VEX.128 encoded version)

DEST[63:0] SRC1[63:0] BITWISE AND SRC2[63:0] DEST[127:64] SRC1[127:64] BITWISE AND SRC2[127:64] DEST[MAXVL-1:128] 0


ANDPD (128-bit Legacy SSE version)

DEST[63:0] DEST[63:0] BITWISE AND SRC[63:0] DEST[127:64] DEST[127:64] BITWISE AND SRC[127:64]

DEST[MAXVL-1:128] (Unmodified)


Intel C/C++ Compiler Intrinsic Equivalent

VANDPD m512d _mm512_and_pd ( m512d a, m512d b);

VANDPD m512d _mm512_mask_and_pd ( m512d s, mmask8 k, m512d a, m512d b); VANDPD m512d _mm512_maskz_and_pd ( mmask8 k, m512d a, m512d b);

VANDPD m256d _mm256_mask_and_pd ( m256d s, mmask8 k, m256d a, m256d b); VANDPD m256d _mm256_maskz_and_pd ( mmask8 k, m256d a, m256d b);

VANDPD m128d _mm_mask_and_pd ( m128d s, mmask8 k, m128d a, m128d b); VANDPD m128d _mm_maskz_and_pd ( mmask8 k, m128d a, m128d b);

VANDPD m256d _mm256_and_pd ( m256d a, m256d b); ANDPD m128d _mm_and_pd ( m128d a, m128d b);


SIMD Floating-Point Exceptions

None



Other Exceptions

VEX-encoded instruction, see Exceptions Type 4. EVEX-encoded instruction, see Exceptions Type E4.


ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

NP 0F 54 /r

ANDPS xmm1, xmm2/m128

A

V/V

SSE

Return the bitwise logical AND of packed single-precision floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.0F 54 /r

VANDPS xmm1,xmm2, xmm3/m128

B

V/V

AVX

Return the bitwise logical AND of packed single-precision floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.0F 54 /r

VANDPS ymm1, ymm2, ymm3/m256

B

V/V

AVX

Return the bitwise logical AND of packed single-precision floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.0F.W0 54 /r

VANDPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND of packed single-precision floating-point values in xmm2 and xmm3/m128/m32bcst subject to writemask k1.

EVEX.NDS.256.0F.W0 54 /r

VANDPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND of packed single-precision floating-point values in ymm2 and ymm3/m256/m32bcst subject to writemask k1.

EVEX.NDS.512.0F.W0 54 /r

VANDPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst

C

V/V

AVX512DQ

Return the bitwise logical AND of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Full

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Performs a bitwise logical AND of the four, eight or sixteen packed single-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAXVL-1:256) of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding ZMM register destination are unmodified.



Operation

VANDPS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j 0 TO KL-1

i j * 32

IF k1[j] OR *no writemask*

IF (EVEX.b == 1) AND (SRC2 *is memory*) THEN

DEST[i+63:i] SRC1[i+31:i] BITWISE AND SRC2[31:0] ELSE

DEST[i+31:i] SRC1[i+31:i] BITWISE AND SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i] 0

FI;

FI;

ENDFOR

DEST[MAXVL-1:VL] 0;


VANDPS (VEX.256 encoded version)

DEST[31:0] SRC1[31:0] BITWISE AND SRC2[31:0] DEST[63:32] SRC1[63:32] BITWISE AND SRC2[63:32] DEST[95:64] SRC1[95:64] BITWISE AND SRC2[95:64] DEST[127:96] SRC1[127:96] BITWISE AND SRC2[127:96] DEST[159:128] SRC1[159:128] BITWISE AND SRC2[159:128] DEST[191:160] SRC1[191:160] BITWISE AND SRC2[191:160] DEST[223:192] SRC1[223:192] BITWISE AND SRC2[223:192] DEST[255:224] SRC1[255:224] BITWISE AND SRC2[255:224]. DEST[MAXVL-1:256] 0;


VANDPS (VEX.128 encoded version)

DEST[31:0] SRC1[31:0] BITWISE AND SRC2[31:0] DEST[63:32] SRC1[63:32] BITWISE AND SRC2[63:32] DEST[95:64] SRC1[95:64] BITWISE AND SRC2[95:64] DEST[127:96] SRC1[127:96] BITWISE AND SRC2[127:96] DEST[MAXVL-1:128] 0;


ANDPS (128-bit Legacy SSE version)

DEST[31:0] DEST[31:0] BITWISE AND SRC[31:0] DEST[63:32] DEST[63:32] BITWISE AND SRC[63:32] DEST[95:64] DEST[95:64] BITWISE AND SRC[95:64] DEST[127:96] DEST[127:96] BITWISE AND SRC[127:96]

DEST[MAXVL-1:128] (Unmodified)



Intel C/C++ Compiler Intrinsic Equivalent

VANDPS m512 _mm512_and_ps ( m512 a, m512 b);

VANDPS m512 _mm512_mask_and_ps ( m512 s, mmask16 k, m512 a, m512 b); VANDPS m512 _mm512_maskz_and_ps ( mmask16 k, m512 a, m512 b);

VANDPS m256 _mm256_mask_and_ps ( m256 s, mmask8 k, m256 a, m256 b); VANDPS m256 _mm256_maskz_and_ps ( mmask8 k, m256 a, m256 b);

VANDPS m128 _mm_mask_and_ps ( m128 s, mmask8 k, m128 a, m128 b); VANDPS m128 _mm_maskz_and_ps ( mmask8 k, m128 a, m128 b);

VANDPS m256 _mm256_and_ps ( m256 a, m256 b); ANDPS m128 _mm_and_ps ( m128 a, m128 b);


SIMD Floating-Point Exceptions

None


Other Exceptions

VEX-encoded instruction, see Exceptions Type 4. EVEX-encoded instruction, see Exceptions Type E4.


ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

66 0F 55 /r

ANDNPD xmm1, xmm2/m128

A

V/V

SSE2

Return the bitwise logical AND NOT of packed double- precision floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.66.0F 55 /r

VANDNPD xmm1, xmm2, xmm3/m128

B

V/V

AVX

Return the bitwise logical AND NOT of packed double- precision floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.66.0F 55/r

VANDNPD ymm1, ymm2, ymm3/m256

B

V/V

AVX

Return the bitwise logical AND NOT of packed double- precision floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.66.0F.W1 55 /r

VANDNPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND NOT of packed double- precision floating-point values in xmm2 and xmm3/m128/m64bcst subject to writemask k1.

EVEX.NDS.256.66.0F.W1 55 /r

VANDNPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND NOT of packed double- precision floating-point values in ymm2 and ymm3/m256/m64bcst subject to writemask k1.

EVEX.NDS.512.66.0F.W1 55 /r

VANDNPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst

C

V/V

AVX512DQ

Return the bitwise logical AND NOT of packed double- precision floating-point values in zmm2 and zmm3/m512/m64bcst subject to writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Full

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Performs a bitwise logical AND NOT of the two, four or eight packed double-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAXVL-1:256) of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding register destination are unmodified.



Operation

VANDNPD (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j 0 TO KL-1

i j * 64

IF k1[j] OR *no writemask*

IF (EVEX.b == 1) AND (SRC2 *is memory*) THEN

DEST[i+63:i] (NOT(SRC1[i+63:i])) BITWISE AND SRC2[63:0] ELSE

DEST[i+63:i] (NOT(SRC1[i+63:i])) BITWISE AND SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] = 0

FI;

FI;

ENDFOR

DEST[MAXVL-1:VL] 0


VANDNPD (VEX.256 encoded version)

DEST[63:0] (NOT(SRC1[63:0])) BITWISE AND SRC2[63:0] DEST[127:64] (NOT(SRC1[127:64])) BITWISE AND SRC2[127:64] DEST[191:128] (NOT(SRC1[191:128])) BITWISE AND SRC2[191:128] DEST[255:192] (NOT(SRC1[255:192])) BITWISE AND SRC2[255:192] DEST[MAXVL-1:256] 0


VANDNPD (VEX.128 encoded version)

DEST[63:0] (NOT(SRC1[63:0])) BITWISE AND SRC2[63:0] DEST[127:64] (NOT(SRC1[127:64])) BITWISE AND SRC2[127:64] DEST[MAXVL-1:128] 0


ANDNPD (128-bit Legacy SSE version)

DEST[63:0] (NOT(DEST[63:0])) BITWISE AND SRC[63:0] DEST[127:64] (NOT(DEST[127:64])) BITWISE AND SRC[127:64]

DEST[MAXVL-1:128] (Unmodified)


Intel C/C++ Compiler Intrinsic Equivalent

VANDNPD m512d _mm512_andnot_pd ( m512d a, m512d b);

VANDNPD m512d _mm512_mask_andnot_pd ( m512d s, mmask8 k, m512d a, m512d b); VANDNPD m512d _mm512_maskz_andnot_pd ( mmask8 k, m512d a, m512d b);

VANDNPD m256d _mm256_mask_andnot_pd ( m256d s, mmask8 k, m256d a, m256d b); VANDNPD m256d _mm256_maskz_andnot_pd ( mmask8 k, m256d a, m256d b);

VANDNPD m128d _mm_mask_andnot_pd ( m128d s, mmask8 k, m128d a, m128d b); VANDNPD m128d _mm_maskz_andnot_pd ( mmask8 k, m128d a, m128d b);

VANDNPD m256d _mm256_andnot_pd ( m256d a, m256d b); ANDNPD m128d _mm_andnot_pd ( m128d a, m128d b);


SIMD Floating-Point Exceptions

None



Other Exceptions

VEX-encoded instruction, see Exceptions Type 4. EVEX-encoded instruction, see Exceptions Type E4.


ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values

Opcode/ Instruction

Op / En

64/32

bit Mode Support

CPUID

Feature Flag

Description

NP 0F 55 /r

ANDNPS xmm1, xmm2/m128

A

V/V

SSE

Return the bitwise logical AND NOT of packed single-precision floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.0F 55 /r

VANDNPS xmm1, xmm2, xmm3/m128

B

V/V

AVX

Return the bitwise logical AND NOT of packed single-precision floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.0F 55 /r

VANDNPS ymm1, ymm2, ymm3/m256

B

V/V

AVX

Return the bitwise logical AND NOT of packed single-precision floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.0F.W0 55 /r VANDNPS xmm1 {k1}{z},

xmm2, xmm3/m128/m32bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND of packed single-precision floating-point values in xmm2 and xmm3/m128/m32bcst subject to writemask k1.

EVEX.NDS.256.0F.W0 55 /r VANDNPS ymm1 {k1}{z},

ymm2, ymm3/m256/m32bcst

C

V/V

AVX512VL AVX512DQ

Return the bitwise logical AND of packed single-precision floating-point values in ymm2 and ymm3/m256/m32bcst subject to writemask k1.

EVEX.NDS.512.0F.W0 55 /r VANDNPS zmm1 {k1}{z},

zmm2, zmm3/m512/m32bcst

C

V/V

AVX512DQ

Return the bitwise logical AND of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1.


Instruction Operand Encoding

Op/En

Tuple Type

Operand 1

Operand 2

Operand 3

Operand 4

A

NA

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

NA

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

C

Full

ModRM:reg (w)

EVEX.vvvv

ModRM:r/m (r)

NA

Description

Performs a bitwise logical AND NOT of the four, eight or sixteen packed single-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAXVL-1:256) of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding ZMM register destination are unmodified.



Operation

VANDNPS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j 0 TO KL-1

i j * 32

IF k1[j] OR *no writemask*

IF (EVEX.b == 1) AND (SRC2 *is memory*) THEN

DEST[i+31:i] (NOT(SRC1[i+31:i])) BITWISE AND SRC2[31:0] ELSE

DEST[i+31:i] (NOT(SRC1[i+31:i])) BITWISE AND SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i] = 0

FI;

FI;

ENDFOR

DEST[MAXVL-1:VL] 0


VANDNPS (VEX.256 encoded version)

DEST[31:0] (NOT(SRC1[31:0])) BITWISE AND SRC2[31:0] DEST[63:32] (NOT(SRC1[63:32])) BITWISE AND SRC2[63:32] DEST[95:64] (NOT(SRC1[95:64])) BITWISE AND SRC2[95:64] DEST[127:96] (NOT(SRC1[127:96])) BITWISE AND SRC2[127:96] DEST[159:128] (NOT(SRC1[159:128])) BITWISE AND SRC2[159:128] DEST[191:160] (NOT(SRC1[191:160])) BITWISE AND SRC2[191:160] DEST[223:192] (NOT(SRC1[223:192])) BITWISE AND SRC2[223:192] DEST[255:224] (NOT(SRC1[255:224])) BITWISE AND SRC2[255:224]. DEST[MAXVL-1:256] 0


VANDNPS (VEX.128 encoded version)

DEST[31:0] (NOT(SRC1[31:0])) BITWISE AND SRC2[31:0] DEST[63:32] (NOT(SRC1[63:32])) BITWISE AND SRC2[63:32] DEST[95:64] (NOT(SRC1[95:64])) BITWISE AND SRC2[95:64] DEST[127:96] (NOT(SRC1[127:96])) BITWISE AND SRC2[127:96] DEST[MAXVL-1:128] 0


ANDNPS (128-bit Legacy SSE version)

DEST[31:0] (NOT(DEST[31:0])) BITWISE AND SRC[31:0] DEST[63:32] (NOT(DEST[63:32])) BITWISE AND SRC[63:32] DEST[95:64] (NOT(DEST[95:64])) BITWISE AND SRC[95:64] DEST[127:96] (NOT(DEST[127:96])) BITWISE AND SRC[127:96]

DEST[MAXVL-1:128] (Unmodified)



Intel C/C++ Compiler Intrinsic Equivalent

VANDNPS m512 _mm512_andnot_ps ( m512 a, m512 b);

VANDNPS m512 _mm512_mask_andnot_ps ( m512 s, mmask16 k, m512 a, m512 b); VANDNPS m512 _mm512_maskz_andnot_ps ( mmask16 k, m512 a, m512 b); VANDNPS m256 _mm256_mask_andnot_ps ( m256 s, mmask8 k, m256 a, m256 b); VANDNPS m256 _mm256_maskz_andnot_ps ( mmask8 k, m256 a, m256 b);

VANDNPS m128 _mm_mask_andnot_ps ( m128 s, mmask8 k, m128 a, m128 b); VANDNPS m128 _mm_maskz_andnot_ps ( mmask8 k, m128 a, m128 b); VANDNPS m256 _mm256_andnot_ps ( m256 a, m256 b);

ANDNPS m128 _mm_andnot_ps ( m128 a, m128 b);


SIMD Floating-Point Exceptions

None


Other Exceptions

VEX-encoded instruction, see Exceptions Type 4. EVEX-encoded instruction, see Exceptions Type E4.


ARPL—Adjust RPL Field of Segment Selector

Opcode

Instruction

Op/ En

64-bit Mode

Compat/ Leg Mode

Description

63 /r

ARPL r/m16, r16

ZO

N. E.

Valid

Adjust RPL of r/m16 to not less than RPL of

r16.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

ZO

ModRM:r/m (w)

ModRM:reg (r)

NA

NA

Description

Compares the RPL fields of two segment selectors. The first operand (the destination operand) contains one segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0 and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand, the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand.

Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can be a word register or a memory location; the source operand must be a word register.)

The ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applica- tions). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system by an application program to match the privilege level of the application program. Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application program’s code segment is placed in the source operand. (The RPL field in the source operand represents the privilege level of the application program.) Execution of the ARPL instruction then ensures that the RPL of the segment selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of the appli- cation program (the segment selector for the application program’s code segment can be read from the stack following a procedure call).

This instruction executes as described in compatibility mode and legacy mode. It is not encodable in 64-bit mode.

See “Checking Caller Access Privileges” in Chapter 3, “Protected-Mode Memory Management,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about the use of this instruc- tion.


Operation

IF 64-BIT MODE THEN

See MOVSXD;

ELSE

IF DEST[RPL] SRC[RPL] THEN

ZF 1;

DEST[RPL] SRC[RPL];

ELSE

ZF 0;

FI;

FI;


Flags Affected

The ZF flag is set to 1 if the RPL field of the destination operand is less than that of the source operand; otherwise, it is set to 0.



Protected Mode Exceptions

#GP(0) If the destination is located in a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

#UD The ARPL instruction is not recognized in real-address mode.

If the LOCK prefix is used.


Virtual-8086 Mode Exceptions

#UD The ARPL instruction is not recognized in virtual-8086 mode.

If the LOCK prefix is used.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

Not applicable.


BLENDPD — Blend Packed Double Precision Floating-Point Values

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 3A 0D /r ib

BLENDPD xmm1, xmm2/m128, imm8

RMI

V/V

SSE4_1

Select packed DP-FP values from xmm1 and xmm2/m128 from mask specified in imm8 and store the values into xmm1.

VEX.NDS.128.66.0F3A.WIG 0D /r ib

VBLENDPD xmm1, xmm2, xmm3/m128, imm8

RVMI

V/V

AVX

Select packed double-precision floating-point Values from xmm2 and xmm3/m128 from mask in imm8 and store the values in xmm1.

VEX.NDS.256.66.0F3A.WIG 0D /r ib

VBLENDPD ymm1, ymm2, ymm3/m256, imm8

RVMI

V/V

AVX

Select packed double-precision floating-point Values from ymm2 and ymm3/m256 from mask in imm8 and store the values in ymm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RMI

ModRM:reg (r, w)

ModRM:r/m (r)

imm8

NA

RVMI

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

imm8[3:0]

Description

Double-precision floating-point values from the second source operand (third operand) are conditionally merged with values from the first source operand (second operand) and written to the destination operand (first operand). The immediate bits [3:0] determine whether the corresponding double-precision floating-point value in the desti- nation is copied from the second source or first source. If a bit in the mask, corresponding to a word, is ”1”, then the double-precision floating-point value in the second source operand is copied, else the value in the first source operand is copied.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding YMM register destination are unmodified.

VEX.128 encoded version: the first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding YMM register destination are zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.


Operation

BLENDPD (128-bit Legacy SSE version)

IF (IMM8[0] = 0)THEN DEST[63:0] DEST[63:0] ELSE DEST [63:0] SRC[63:0] FI

IF (IMM8[1] = 0) THEN DEST[127:64] DEST[127:64] ELSE DEST [127:64] SRC[127:64] FI

DEST[MAXVL-1:128] (Unmodified)


VBLENDPD (VEX.128 encoded version)

IF (IMM8[0] = 0)THEN DEST[63:0] SRC1[63:0] ELSE DEST [63:0] SRC2[63:0] FI

IF (IMM8[1] = 0) THEN DEST[127:64] SRC1[127:64] ELSE DEST [127:64] SRC2[127:64] FI

DEST[MAXVL-1:128] 0



VBLENDPD (VEX.256 encoded version)

IF (IMM8[0] = 0)THEN DEST[63:0] SRC1[63:0] ELSE DEST [63:0] SRC2[63:0] FI

IF (IMM8[1] = 0) THEN DEST[127:64] SRC1[127:64] ELSE DEST [127:64] SRC2[127:64] FI

IF (IMM8[2] = 0) THEN DEST[191:128] SRC1[191:128] ELSE DEST [191:128] SRC2[191:128] FI

IF (IMM8[3] = 0) THEN DEST[255:192] SRC1[255:192] ELSE DEST [255:192] SRC2[255:192] FI


Intel C/C++ Compiler Intrinsic Equivalent

BLENDPD: m128d _mm_blend_pd ( m128d v1, m128d v2, const int mask); VBLENDPD: m256d _mm256_blend_pd ( m256d a, m256d b, const int mask);

SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4.


BEXTR — Bit Field Extract

Opcode/Instruction

Op/ En

64/32

-bit Mode

CPUID

Feature Flag

Description

VEX.NDS.LZ.0F38.W0 F7 /r

BEXTR r32a, r/m32, r32b

RMV

V/V

BMI1

Contiguous bitwise extract from r/m32 using r32b as control; store result in r32a.

VEX.NDS.LZ.0F38.W1 F7 /r

BEXTR r64a, r/m64, r64b

RMV

V/N.E.

BMI1

Contiguous bitwise extract from r/m64 using r64b as control; store result in r64a


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RMV

ModRM:reg (w)

ModRM:r/m (r)

VEX.vvvv (r)

NA

Description

Extracts contiguous bits from the first source operand (the second operand) using an index value and length value specified in the second source operand (the third operand). Bit 7:0 of the second source operand specifies the starting bit position of bit extraction. A START value exceeding the operand size will not extract any bits from the second source operand. Bit 15:8 of the second source operand specifies the maximum number of bits (LENGTH) beginning at the START position to extract. Only bit positions up to (OperandSize -1) of the first source operand are extracted. The extracted bits are written to the destination register, starting from the least significant bit. All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed. The destination register is cleared if no bits are extracted.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.


Operation

START SRC2[7:0];

LEN SRC2[15:8];

TEMP ZERO_EXTEND_TO_512 (SRC1 );

DEST ZERO_EXTEND(TEMP[START+LEN -1: START]); ZF (DEST = 0);

Flags Affected

ZF is updated based on the result. AF, SF, and PF are undefined. All other flags are cleared.


Intel C/C++ Compiler Intrinsic Equivalent

BEXTR: unsigned int32 _bextr_u32(unsigned int32 src, unsigned int32 start. unsigned int32 len); BEXTR: unsigned int64 _bextr_u64(unsigned int64 src, unsigned int32 start. unsigned int32 len);

SIMD Floating-Point Exceptions

None


Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.


BLENDPS — Blend Packed Single Precision Floating-Point Values

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 3A 0C /r ib

BLENDPS xmm1, xmm2/m128, imm8

RMI

V/V

SSE4_1

Select packed single precision floating-point values from xmm1 and xmm2/m128 from mask specified in imm8 and store the values into xmm1.

VEX.NDS.128.66.0F3A.WIG 0C /r ib

VBLENDPS xmm1, xmm2, xmm3/m128, imm8

RVMI

V/V

AVX

Select packed single-precision floating-point values from xmm2 and xmm3/m128 from mask in imm8 and store the values in xmm1.

VEX.NDS.256.66.0F3A.WIG 0C /r ib

VBLENDPS ymm1, ymm2, ymm3/m256, imm8

RVMI

V/V

AVX

Select packed single-precision floating-point values from ymm2 and ymm3/m256 from mask in imm8 and store the values in ymm1.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RMI

ModRM:reg (r, w)

ModRM:r/m (r)

imm8

NA

RVMI

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

imm8

Description

Packed single-precision floating-point values from the second source operand (third operand) are conditionally merged with values from the first source operand (second operand) and written to the destination operand (first operand). The immediate bits [7:0] determine whether the corresponding single precision floating-point value in the destination is copied from the second source or first source. If a bit in the mask, corresponding to a word, is “1”, then the single-precision floating-point value in the second source operand is copied, else the value in the first source operand is copied.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti- nation is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding YMM register destination are unmodified.

VEX.128 encoded version: The first source operand an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding YMM register destination are zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.


Operation

BLENDPS (128-bit Legacy SSE version)

IF (IMM8[0] = 0) THEN DEST[31:0] DEST[31:0] ELSE DEST [31:0] SRC[31:0] FI

IF (IMM8[1] = 0) THEN DEST[63:32] DEST[63:32] ELSE DEST [63:32] SRC[63:32] FI

IF (IMM8[2] = 0) THEN DEST[95:64] DEST[95:64] ELSE DEST [95:64] SRC[95:64] FI

IF (IMM8[3] = 0) THEN DEST[127:96] DEST[127:96] ELSE DEST [127:96] SRC[127:96] FI

DEST[MAXVL-1:128] (Unmodified)



VBLENDPS (VEX.128 encoded version)

IF (IMM8[0] = 0) THEN DEST[31:0] SRC1[31:0] ELSE DEST [31:0] SRC2[31:0] FI

IF (IMM8[1] = 0) THEN DEST[63:32] SRC1[63:32] ELSE DEST [63:32] SRC2[63:32] FI

IF (IMM8[2] = 0) THEN DEST[95:64] SRC1[95:64] ELSE DEST [95:64] SRC2[95:64] FI

IF (IMM8[3] = 0) THEN DEST[127:96] SRC1[127:96] ELSE DEST [127:96] SRC2[127:96] FI

DEST[MAXVL-1:128] 0


VBLENDPS (VEX.256 encoded version)

IF (IMM8[0] = 0) THEN DEST[31:0] SRC1[31:0] ELSE DEST [31:0] SRC2[31:0] FI

IF (IMM8[1] = 0) THEN DEST[63:32] SRC1[63:32] ELSE DEST [63:32] SRC2[63:32] FI

IF (IMM8[2] = 0) THEN DEST[95:64] SRC1[95:64] ELSE DEST [95:64] SRC2[95:64] FI

IF (IMM8[3] = 0) THEN DEST[127:96] SRC1[127:96] ELSE DEST [127:96] SRC2[127:96] FI

IF (IMM8[4] = 0) THEN DEST[159:128] SRC1[159:128] ELSE DEST [159:128] SRC2[159:128] FI

IF (IMM8[5] = 0) THEN DEST[191:160] SRC1[191:160] ELSE DEST [191:160] SRC2[191:160] FI

IF (IMM8[6] = 0) THEN DEST[223:192] SRC1[223:192] ELSE DEST [223:192] SRC2[223:192] FI

IF (IMM8[7] = 0) THEN DEST[255:224] SRC1[255:224] ELSE DEST [255:224] SRC2[255:224] FI.


Intel C/C++ Compiler Intrinsic Equivalent

BLENDPS: m128 _mm_blend_ps ( m128 v1, m128 v2, const int mask); VBLENDPS: m256 _mm256_blend_ps ( m256 a, m256 b, const int mask);

SIMD Floating-Point Exceptions

None


Other Exceptions

See Exceptions Type 4.


BLENDVPD — Variable Blend Packed Double Precision Floating-Point Values

Opcode/ Instruction

Op/ En

64/32-bit Mode

CPUID

Feature Flag

Description

66 0F 38 15 /r

BLENDVPD xmm1, xmm2/m128 , <XMM0>

RM0

V/V

SSE4_1

Select packed DP FP values from xmm1 and xmm2 from mask specified in XMM0 and store the values in xmm1.

VEX.NDS.128.66.0F3A.W0 4B /r /is4

VBLENDVPD xmm1, xmm2, xmm3/m128, xmm4

RVMR

V/V

AVX

Conditionally copy double-precision floating- point values from xmm2 or xmm3/m128 to xmm1, based on mask bits in the mask operand, xmm4.

VEX.NDS.256.66.0F3A.W0 4B /r /is4

VBLENDVPD ymm1, ymm2, ymm3/m256, ymm4

RVMR

V/V

AVX

Conditionally copy double-precision floating- point values from ymm2 or ymm3/m256 to ymm1, based on mask bits in the mask operand, ymm4.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM0

ModRM:reg (r, w)

ModRM:r/m (r)

implicit XMM0

NA

RVMR

ModRM:reg (w)

VEX.vvvv (r)

ModRM:r/m (r)

imm8[7:4]

Description

Conditionally copy each quadword data element of double-precision floating-point value from the second source operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits are the most significant bit in each quadword element of the mask register.

Each quadword element of the destination operand is copied from:

If the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no value is loaded in the destination operand.



Table 3-55. Segment and Gate Descriptor Types

Type

Protected

Mode

IA-32e

Mode

Name

Valid

Name

Valid

0

Reserved

No

Reserved

No

1

Available 16-bit TSS

Yes

Reserved

No

2

LDT

Yes

LDT1

Yes

3

Busy 16-bit TSS

Yes

Reserved

No

4

16-bit call gate

No

Reserved

No

5

16-bit/32-bit task gate

No

Reserved

No

6

16-bit interrupt gate

No

Reserved

No

7

16-bit trap gate

No

Reserved

No

8

Reserved

No

Reserved

No

9

Available 32-bit TSS

Yes

64-bit TSS1

Yes

A

Reserved

No

Reserved

No

B

Busy 32-bit TSS

Yes

Busy 64-bit TSS1

Yes

C

32-bit call gate

No

64-bit call gate

No

D

Reserved

No

Reserved

No

E

32-bit interrupt gate

No

64-bit interrupt gate

No

F

32-bit trap gate

No

64-bit trap gate

No

NOTES:

1. In this case, the descriptor comprises 16 bytes; bits 12:8 of the upper 4 bytes must be 0.


Operation

IF SRC(Offset) descriptor table limit THEN ZF 0; FI;

Read segment descriptor;

IF SegmentDescriptor(Type) conforming code segment and (CPL DPL) OR (RPL DPL)

or Segment type is not valid for instruction THEN

ZF 0; ELSE

temp SegmentLimit([SRC]);

IF (G 1)

THEN temp ShiftLeft(12, temp) OR 00000FFFH; ELSE IF OperandSize 32

THEN DEST temp; FI;

ELSE IF OperandSize 64 (* REX.W used *) THEN DEST (* Zero-extended *) temp; FI;

ELSE (* OperandSize 16 *)

DEST temp AND FFFFH;

FI;

FI;



Flags Affected

The ZF flag is set to 1 if the segment limit is loaded successfully; otherwise, it is set to 0.


Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and the memory operand effective address is unaligned while the current privilege level is 3.

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

#UD The LSL instruction cannot be executed in real-address mode.


Virtual-8086 Mode Exceptions

#UD The LSL instruction cannot be executed in virtual-8086 mode.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#SS(0) If the memory operand effective address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory operand effective address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and the memory operand effective address is unaligned while the current privilege level is 3.

#UD If the LOCK prefix is used.


LTR—Load Task Register

Opcode

Instruction

Op/ En

64-Bit Mode

Compat/ Leg Mode

Description

0F 00 /3

LTR r/m16

M

Valid

Valid

Load r/m16 into task register.


Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

M

ModRM:r/m (r)

NA

NA

NA

Description

Loads the source operand into the segment selector field of the task register. The source operand (a general- purpose register or a memory location) contains a segment selector that points to a task state segment (TSS). After the segment selector is loaded in the task register, the processor uses the segment selector to locate the segment descriptor for the TSS in the global descriptor table (GDT). It then loads the segment limit and base address for the TSS from the segment descriptor into the task register. The task pointed to by the task register is marked busy, but a switch to the task does not occur.

The LTR instruction is provided for use in operating-system software; it should not be used in application programs. It can only be executed in protected mode when the CPL is 0. It is commonly used in initialization code to establish the first task to be executed.

The operand-size attribute has no effect on this instruction.

In 64-bit mode, the operand size is still fixed at 16 bits. The instruction references a 16-byte descriptor to load the 64-bit base.


Operation

IF SRC is a NULL selector THEN #GP(0);

IF SRC(Offset) descriptor table limit OR IF SRC(type) global THEN #GP(segment selector); FI;

Read segment descriptor;

IF segment descriptor is not for an available TSS THEN #GP(segment selector); FI;

IF segment descriptor is not present THEN #NP(segment selector); FI;

TSSsegmentDescriptor(busy) 1;

(* Locked read-modify-write operation on the entire descriptor when setting busy flag *)

TaskRegister(SegmentSelector) SRC; TaskRegister(SegmentDescriptor) TSSSegmentDescriptor;

Flags Affected

None



Protected Mode Exceptions

#GP(0) If the current privilege level is not 0.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the source operand contains a NULL segment selector.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment selector.

#GP(selector) If the source selector points to a segment that is not a TSS or to one for a task that is already busy.

If the selector points to LDT or is beyond the GDT limit.

#NP(selector) If the TSS is marked not present.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#UD If the LOCK prefix is used.


Real-Address Mode Exceptions

#UD The LTR instruction is not recognized in real-address mode.


Virtual-8086 Mode Exceptions

#UD The LTR instruction is not recognized in virtual-8086 mode.


Compatibility Mode Exceptions

Same exceptions as in protected mode.


64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the current privilege level is not 0.

If the memory address is in a non-canonical form.

If the source operand contains a NULL segment selector.

#GP(selector) If the source selector points to a segment that is not a TSS or to one for a task that is already busy.

If the selector points to LDT or is beyond the GDT limit.

If the descriptor type of the upper 8-byte of the 16-byte descriptor is non-zero.

#NP(selector) If the TSS is marked not present.

#PF(fault-code) If a page fault occurs.

#UD If the LOCK prefix is used.


LZCNT— Count the Number of Leading Zero Bits

Opcode/Instruction

Op/ En

64/32

-bit Mode

CPUID

Feature Flag

Description

F3 0F BD /r

RM

V/V

LZCNT

Count the number of leading zero bits in r/m16, return result in r16.

LZCNT r16, r/m16




F3 0F BD /r

RM

V/V

LZCNT

Count the number of leading zero bits in r/m32, return result in r32.

LZCNT r32, r/m32




F3 REX.W 0F BD /r

RM

V/N.E.

LZCNT

Count the number of leading zero bits in r/m64, return result in r64.

LZCNT r64, r/m64





Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

RM

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

Description

Counts the number of leading most significant zero bits in a source operand (second operand) returning the result into a destination (first operand).

LZCNT differs from BSR. For example, LZCNT will produce the operand size when the input operand is zero. It should be noted that on processors that do not support LZCNT, the instruction byte encoding is executed as BSR.

In 64-bit mode 64-bit operand size requires REX.W=1.


Operation

temp OperandSize - 1 DEST 0

WHILE (temp >= 0) AND (Bit(SRC, temp) = 0) DO

temp temp - 1 DEST DEST+ 1

OD


IF DEST = OperandSize CF 1

ELSE

CF 0

FI


IF DEST = 0

ZF 1 ELSE

ZF 0

FI


Flags Affected

ZF flag is set to 1 in case of zero output (most significant bit of the source is set), and to 0 otherwise, CF flag is set to 1 if input was zero and cleared otherwise. OF, SF, PF and AF flags are undefined.



Intel C/C++ Compiler Intrinsic Equivalent

LZCNT: unsigned int32 _lzcnt_u32(unsigned int32 src); LZCNT: unsigned int64 _lzcnt_u64(unsigned int64 src);

Protected Mode Exceptions

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

#SS(0) For an illegal address in the SS segment.

#PF (fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


Real-Address Mode Exceptions

#GP(0) If any part of the operand lies outside of the effective address space from 0 to 0FFFFH.

#SS(0) For an illegal address in the SS segment.


Virtual 8086 Mode Exceptions

#GP(0) If any part of the operand lies outside of the effective address space from 0 to 0FFFFH.

#SS(0) For an illegal address in the SS segment.

#PF (fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.


Compatibility Mode Exceptions

Same exceptions as in Protected Mode.


64-Bit Mode Exceptions

#GP(0) If the memory address is in a non-canonical form.

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#PF (fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.